Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #378294

Research Project: Database Tools for Managing and Analyzing Big Data Sets to Enhance Small Grains Breeding

Location: Plant, Soil and Nutrition Research

Title: Using public databases for genomic prediction of tropical maize lines

Author
item MORAIS, PEDRO - Universidade Federal De Vicosa
item AKDEMIR, DENIZ - Michigan State University
item ROGERIO BRAATZ DE AN, LUCIANO - Universidade Federal De Vicosa
item Jannink, Jean-Luc
item FRITSCHE-NETO, ROBERTO - Universidade De Sao Paulo
item BOREM, ALUIZIO - Universidade Federal De Vicosa
item ALVEZ, FILIPE - Universidade De Sao Paulo
item LYRA, DANILO - Universidade De Sao Paulo
item GRANATO, ITALO - Universidade De Sao Paulo

Submitted to: Plant Breeding
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/4/2020
Publication Date: 8/9/2020
Citation: Morais, P., Akdemir, D., Rogerio Braatz De Andrade, L., Jannink, J., Fritsche-Neto, R., Borem, A., Alvez, F.C., Lyra, D.H., Granato, I.S. 2020. Using public databases for genomic prediction of tropical maize lines. Plant Breeding. 139(4):697-707. https://doi.org/10.1111/pbr.12827.
DOI: https://doi.org/10.1111/pbr.12827

Interpretive Summary: Public databases contain a wealth of genetic and evaluation data that is openly available. We tested the usefulness of this data to predict genetic values for tropical maize inbred lines regarding plant and ear height. We identified how the population structure, the use of optimized training sets (OTSs) and the amount of information originating from public databases affected prediction accuracy. In total, 29 training sets (TSs) were defined considering diversity panels from the University of São Paulo and the USDA North Central Regional Plant Introduction Station. These TSs were divided into four scenarios with different configurations. We showed that it is possible to use public datasets as a primary TS and that population structure can modify the predictive abilities of GS. In the four scenarios proposed, very large or very small sets did not provide predictive abilities over 0.53 for GS. However, OTSs composed of 250 individuals were sufficient to achieve predictive abilities over this limit. These results provide a rationale for the continued funding of public databasing efforts.

Technical Abstract: In this paper, the aims were (a) to test the usefulness of using genomic and phenotypic information from public databases (open access) to predict genetic values for tropical maize inbred lines regarding plant and ear height; (b) to identify how the population structure, the use of optimized training sets (OTSs) and the amount of information originating from public databases affect the predictive ability. Thus, 29 training sets (TSs) were defined considering three diversity panels: the University of São Paulo (USP—validation set (VS)) and the ASSO and USDA North Central Regional Plant Introduction Station (NCRPIS) (external public panels—predictors), which were divided into four scenarios with different TS configurations. We showed that it is possible to use public datasets as a primary TS and that population structure can modify the predictive abilities of GS. In the four scenarios proposed, very large or very small sets did not provide predictive abilities over 0.53 for GS. However, OTSs composed of 250 individuals were sufficient to achieve predictive abilities over this limit.