Publication : USDA ARS

ARS Home » Southeast Area » Raleigh, North Carolina » Plant Science Research » Research » Publications at this Location » Publication #361627

Research Project: Genetic Improvement of Small Grains and Characterization of Pathogen Populations

Location: Plant Science Research

Title: Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel

Author

	SARINELLI, J - North Carolina State University
	MURPHY, J - North Carolina State University
	TYAGI, PRIYANKA - North Carolina State University
	Holland, James
	JOHNSON, JERRY - University Of Georgia
	MERGOUM, MOHAMED - University Of Georgia
	MASON, RICHARD - University Of Arkansas
	BABAR, ALI - University Of Florida
	HARRISON, STEPHEN - Louisana State University
	SUTTON, RUSSELL - Texas A&M University
	GRIFFEY, CARL - Virginia Polytechnic Institution & State University
	Brown Guedira, Gina

Submitted to: Theoretical and Applied Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/20/2019
Publication Date: 2/24/2019
Citation: Sarinelli, J.M., Murphy, J.P., Tyagi, P., Holland, J.B., Johnson, J.W., Mergoum, M., Mason, R.E., Babar, A., Harrison, S., Sutton, R., Griffey, C.A., Brown Guedira, G.L. 2019. Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theoretical and Applied Genetics. 132:1247.

Interpretive Summary: Genomic selection is a tool being used by plant breeders where data from field performance is combined with DNA marker data to “train” models that allow for prediction of field performance of lines for which only DNA marker data is available. Plant breeding programs often have access to a large amount of historical data that is highly unbalanced, particularly across years. This study examined approaches to utilizing lines tested in historical yield trails as training populations to integrate genomic data for predicting performance of untested breeding lines. We used a cross-validation method to evaluate the correlation between observed performance and predicted performance in a set of 467 winter wheat lines evaluated in the Gulf Atlantic Wheat Nursery (GAWN) from 2008 to 2016. We evaluated the impact of different training population sizes and training population selection methods (Random, Clustering, PEVmean and PEVmean1) on predictive ability. We also evaluated inclusion of markers associated with major genes as fixed effects in prediction models for heading date, plant height, and resistance to powdery mildew of wheat. Increases in predictive ability as the size of the training population increased were more evident for Random and Clustering training population selection methods. The training population selection methods based on minimization of the prediction error variance (PEV) outperformed the Random and Clustering methods across all the population sizes. Major genes added as fixed effects always improved model predictive ability, with the greatest gains coming from combinations of multiple genes. Maximum predictabilities among all prediction methods were 0.64 for grain yield, 0.56 for test weight, 0.71 for heading date, 0.73 for plant height, and 0.60 for powdery mildew resistance. Our results demonstrate the utility of combining historical phenotypic records with genome wide SNP marker data for predicting the performance of untested lines.

Technical Abstract: Plant breeding programs often have access to a large amount of historical data that is highly unbalanced, particularly across years. This study examined approaches to utilize these data sets as training populations to integrate genomic selection into existing pipelines. We used cross-validation to evaluate predictive ability in an unbalanced data set of 467 winter wheat (Triticum aestivum L.) genotypes evaluated in the Gulf Atlantic Wheat Nursery (GAWN) from 2008 to 2016. We evaluated the impact of different training population sizes and training population selection methods (Random, Clustering, PEVmean and PEVmean1) on predictive ability. We also evaluated inclusion of markers associated with major genes as fixed effects in prediction models for heading date, plant height, and resistance to powdery mildew (caused by Blumeria graminis f. sp. tritici). Increases in predictive ability as the size of the training population increased were more evident for Random and Clustering training population selection methods than for PEVmean and PEVmean1. The selection methods based on minimization of the prediction error variance (PEV) outperformed the Random and Clustering methods across all the population sizes. Major genes added as fixed effects always improved model predictive ability, with the greatest gains coming from combinations of multiple genes. Maximum predictabilities among all prediction methods were 0.64 for grain yield, 0.56 for test weight, 0.71 for heading date, 0.73 for plant height, and 0.60 for powdery mildew resistance. Our results demonstrate the utility of combining unbalanced phenotypic records with genome wide SNP marker data for predicting the performance of untested genotypes.

U.S. DEPARTMENT OF AGRICULTURE

Plant Science Research: Raleigh, NC