Publication : USDA ARS

ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #378294

Research Project: Database Tools for Managing and Analyzing Big Data Sets to Enhance Small Grains Breeding

Location: Plant, Soil and Nutrition Research

Title: Using public databases for genomic prediction of tropical maize lines

Author

	MORAIS, PEDRO - Universidade Federal De Vicosa
	AKDEMIR, DENIZ - Michigan State University
	ROGERIO BRAATZ DE AN, LUCIANO - Universidade Federal De Vicosa
	Jannink, Jean-Luc
	FRITSCHE-NETO, ROBERTO - Universidade De Sao Paulo
	BOREM, ALUIZIO - Universidade Federal De Vicosa
	ALVEZ, FILIPE - Universidade De Sao Paulo
	LYRA, DANILO - Universidade De Sao Paulo
	GRANATO, ITALO - Universidade De Sao Paulo

Submitted to: Plant Breeding
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/4/2020
Publication Date: 8/9/2020
Citation: Morais, P., Akdemir, D., Rogerio Braatz De Andrade, L., Jannink, J., Fritsche-Neto, R., Borem, A., Alvez, F.C., Lyra, D.H., Granato, I.S. 2020. Using public databases for genomic prediction of tropical maize lines. Plant Breeding. 139(4):697-707. https://doi.org/10.1111/pbr.12827.
DOI: https://doi.org/10.1111/pbr.12827

Interpretive Summary: Public databases contain a wealth of genetic and evaluation data that is openly available. We tested the usefulness of this data to predict genetic values for tropical maize inbred lines regarding plant and ear height. We identified how the population structure, the use of optimized training sets (OTSs) and the amount of information originating from public databases affected prediction accuracy. In total, 29 training sets (TSs) were defined considering diversity panels from the University of São Paulo and the USDA North Central Regional Plant Introduction Station. These TSs were divided into four scenarios with different configurations. We showed that it is possible to use public datasets as a primary TS and that population structure can modify the predictive abilities of GS. In the four scenarios proposed, very large or very small sets did not provide predictive abilities over 0.53 for GS. However, OTSs composed of 250 individuals were sufficient to achieve predictive abilities over this limit. These results provide a rationale for the continued funding of public databasing efforts.

Technical Abstract: In this paper, the aims were (a) to test the usefulness of using genomic and phenotypic information from public databases (open access) to predict genetic values for tropical maize inbred lines regarding plant and ear height; (b) to identify how the population structure, the use of optimized training sets (OTSs) and the amount of information originating from public databases affect the predictive ability. Thus, 29 training sets (TSs) were defined considering three diversity panels: the University of São Paulo (USP—validation set (VS)) and the ASSO and USDA North Central Regional Plant Introduction Station (NCRPIS) (external public panels—predictors), which were divided into four scenarios with different TS configurations. We showed that it is possible to use public datasets as a primary TS and that population structure can modify the predictive abilities of GS. In the four scenarios proposed, very large or very small sets did not provide predictive abilities over 0.53 for GS. However, OTSs composed of 250 individuals were sufficient to achieve predictive abilities over this limit.

U.S. DEPARTMENT OF AGRICULTURE

Plant, Soil and Nutrition Research: Ithaca, NY