Location: Plant Science Research
Title: The accuracy for genomic prediction between environments and populations for soft wheat traitsAuthor
HUANG, MAO - The Ohio State University | |
WARD, BRIAN - Virginia Polytechnic Institution & State University | |
VAN SANFORD, DAVID - University Of Kentucky | |
MCKENDRY, ANNE - University Of Missouri | |
Brown-Guedira, Gina | |
TYAGI, PRIYANKA - North Carolina State University | |
SNELLER, CLAY - The Ohio State University |
Submitted to: Crop Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 9/15/2018 Publication Date: 11/15/2018 Citation: Huang, M., Ward, B., Van Sanford, D., McKendry, A., Brown Guedira, G.L., Tyagi, P., Sneller, C. 2018. The accuracy for genomic prediction between environments and populations for soft wheat traits. Crop Science. 58:1-15. Interpretive Summary: Genomic selection (GS) is a breeding methodology that uses DNA markers information in combination with trait data to develop models that allow breeders to predict the performance of lines that have not been evaluated in the field. To do this, GS uses the data on a training population (TP) to estimate the value of lines in a selection population (SP) for which only marker data is available. However, the TP and SP are often grown in different environments, which can cause low prediction accuracy when the correlation of genetic effects between environments is low. Subsets of TP data may be more predictive than using all TP data. Our objective were 1) to evaluate the effect of using subsets of TP data on GS accuracy between environments, and 2) to assess the accuracy of models incorporating marker by environment interactions (MEI). Two wheat populations were evaluated for 11 traits in independent environments and genotyped with SNP markers. Within each population-trait combination, similar environments were clustered. Trait data from one cluster was used to predict the value of the same lines in the other cluster(s) of environments. GS models were built using all TP data or subsets of markers selected for their effect and stability. We found that the between-environment GS accuracy was generally greatest using a subset of stable and significant markers: accuracy increased up to 48% relative to using all TP data. Using optimized subsets of markers within a population can improve GS accuracy by reducing noise in the prediction data set. Technical Abstract: Genomic selection (GS) uses training population (TP) data to estimate the value of lines in a selection population (SP). In breeding, the TP and SP are often grown in different environments which can cause low prediction accuracy when the correlation of genetic effects between environments is low. Subsets of TP data may be more predictive than using all TP data. Our objective were 1) to evaluate the effect of using subsets of TP data on GS accuracy between environments, and 2) to assess the accuracy of models incorporating marker by environment interactions (MEI). Two wheat populations were phenotyped for 11 traits in independent environments and genotyped with SNP markers. Within each population-trait combination, environments were clustered. Data from one cluster was used as the TP to predict the value of the same lines in the other cluster(s) of environments. Models were built using all TP data or subsets of markers selected for their effect and stability. The GS accuracy using all TP data was greater than 0.25 for nine of 11 traits. The between-environment accuracy was generally greatest using a subset of stable and significant markers: accuracy increased up to 48% relative to using all TP data. We also assessed accuracy using each population as the TP and the other as the SP. Using subsets of TP data or the MEI models did not improve accuracy between populations. Using optimized subsets of markers within a population can improve GS accuracy by reducing noise in the prediction data set. |