Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #308935

Title: Efficient use of historical data for genomic selection: a case study of rust resistance in wheat

Author
item RUTKOSKI, J. - Cornell University
item SINGH, R.P. - International Maize & Wheat Improvement Center (CIMMYT)
item HUERTA-ESPINO, J. - Instituto Tecnologico El Llano
item BHAVANI, S. - International Maize & Wheat Improvement Center (CIMMYT)
item Poland, Jesse
item Jannink, Jean-Luc
item SORRELLS, M.E. - Cornell University

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/12/2014
Publication Date: 3/13/2015
Citation: Rutkoski, J., Singh, R., Huerta-Espino, J., Bhavani, S., Poland, J.A., Jannink, J., Sorrells, M. 2015. Efficient use of historical data for genomic selection: a case study of rust resistance in wheat. The Plant Genome. (8). DOI: 10.3835/plantgenome2014.09.0046.

Interpretive Summary: Genomic selection (GS) can accelerate wheat breeding by predicting which selection candidates have highest value. To implement GS, a training population (TP) with both phenotypic and genotypic data is required to fit a prediction model for genotyped selection candidates (SCs). Several factors affect prediction accuracy, the relationship between the TP and the SCs being one of the most important. This study investigated the utility of a non-specific TP composed of historical data from the breeding program compared with a specific TP of more related and contemporary individuals, and the possibilities for optimizing the use of historical and contemporary data together. We found that the specific TP was 1.5 to 4.4 times more accurate than the historical TP. TP optimization enabled the selection of historical TP subsets that were significantly more accurate than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 12% increase in accuracy at best, and a 12% decrease in accuracy at worst depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is large and of high heritability. TP optimization would be useful for the identification of historical TP subsets to phenotype additional traits. However after updating the TP with contemporary data, discarding historical data may be warranted. More empirical studies are needed to determine if these observations can be generalized.

Technical Abstract: Genomic selection (GS) is a new methodology that can improve wheat breeding efficiency. To implement GS, a training population (TP) with both phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). Several factors impact prediction accuracy, the relationship between the TP and the SCs being one of the most important. This study investigated the utility of a historical TPH compared with a population specific TP, the potential for TP optimization using historical TPH subsets, and the utility of historical TP data when close relative data is available for training. We found that, depending on TP size, a population specific TP was 1.5 to 4.4 times more accurate than a historical TP. TP optimization based on the mean of the generalized coefficient of determination (CDmean) or prediction error variance (PEVmean) enabled the selection of historical TP subsets that were significantly more accurate than randomly selected subsets. Retaining historical data when data on close relatives were available lead to an 11.9% increase in accuracy at best, and a 12% decrease in accuracy at worst depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. TP optimization would be useful for the identification of historical TP subsets to phenotype additional traits. However after model updating, discarding historical data may be warranted. More empirical studies are needed to determine if these observations represent general trends.