Author
RUTKOSKI, J. - Cornell University | |
SINGH, R.P. - International Maize & Wheat Improvement Center (CIMMYT) | |
HUERTA-ESPINO, J. - Instituto Tecnologico El Llano | |
BHAVANI, S. - International Maize & Wheat Improvement Center (CIMMYT) | |
Poland, Jesse | |
Jannink, Jean-Luc | |
SORRELLS, M.E. - Cornell University |
Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 11/12/2014 Publication Date: 3/13/2015 Citation: Rutkoski, J., Singh, R., Huerta-Espino, J., Bhavani, S., Poland, J.A., Jannink, J., Sorrells, M. 2015. Efficient use of historical data for genomic selection: a case study of rust resistance in wheat. The Plant Genome. (8). DOI: 10.3835/plantgenome2014.09.0046. Interpretive Summary: Genomic selection (GS) can accelerate wheat breeding by predicting which selection candidates have highest value. To implement GS, a training population (TP) with both phenotypic and genotypic data is required to fit a prediction model for genotyped selection candidates (SCs). Several factors affect prediction accuracy, the relationship between the TP and the SCs being one of the most important. This study investigated the utility of a non-specific TP composed of historical data from the breeding program compared with a specific TP of more related and contemporary individuals, and the possibilities for optimizing the use of historical and contemporary data together. We found that the specific TP was 1.5 to 4.4 times more accurate than the historical TP. TP optimization enabled the selection of historical TP subsets that were significantly more accurate than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 12% increase in accuracy at best, and a 12% decrease in accuracy at worst depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is large and of high heritability. TP optimization would be useful for the identification of historical TP subsets to phenotype additional traits. However after updating the TP with contemporary data, discarding historical data may be warranted. More empirical studies are needed to determine if these observations can be generalized. Technical Abstract: Genomic selection (GS) is a new methodology that can improve wheat breeding efficiency. To implement GS, a training population (TP) with both phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). Several factors impact prediction accuracy, the relationship between the TP and the SCs being one of the most important. This study investigated the utility of a historical TPH compared with a population specific TP, the potential for TP optimization using historical TPH subsets, and the utility of historical TP data when close relative data is available for training. We found that, depending on TP size, a population specific TP was 1.5 to 4.4 times more accurate than a historical TP. TP optimization based on the mean of the generalized coefficient of determination (CDmean) or prediction error variance (PEVmean) enabled the selection of historical TP subsets that were significantly more accurate than randomly selected subsets. Retaining historical data when data on close relatives were available lead to an 11.9% increase in accuracy at best, and a 12% decrease in accuracy at worst depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. TP optimization would be useful for the identification of historical TP subsets to phenotype additional traits. However after model updating, discarding historical data may be warranted. More empirical studies are needed to determine if these observations represent general trends. |