Location: Plant Science Research
Title: Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training dataAuthor
ROGERS, ANNA - North Carolina State University | |
Holland, Jim - Jim |
Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 12/6/2021 Publication Date: 12/28/2021 Citation: Rogers, A., Holland, J.B. 2021. Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training data. G3, Genes/Genomes/Genetics. https://doi.org/10.1093/g3journal/jkab440. DOI: https://doi.org/10.1093/g3journal/jkab440 Interpretive Summary: Technology advances have made possible the collection of a wealth of genomic, environmental, and phenotypic data for use in plant breeding. Incorporation of environmental data into environment-specific genomic prediction is hindered in part because of inherently high data dimensionality. Computationally efficient approaches to combining genomic and environmental information may facilitate extension of genomic prediction models to new environments and germplasm, and better understanding of genotype-by-environment (G E) interactions. Using genomic, yield trial, and environmental data on 1,918 unique hybrids evaluated in 59 environments from the maize Genomes to Fields project, we determined that a set of 10,153 SNP dominance coefficients and a 5-day temporal window size for summarizing environmental variables were optimal for genomic prediction using only genetic and environmental main effects. Adding marker-byenvironment variable interactions required dimension reduction, and we found that reducing dimensionality of the genetic data while keeping the full set of environmental covariates was best for environment-specific genomic prediction of grain yield, leading to an increase in prediction ability of 2.7% to achieve a prediction ability of 80% across environments when data were masked at random. We then measured how prediction ability within environments was affected under stratified training-testing sets to approximate scenarios commonly encountered by plant breeders, finding that incorporation of marker-by-environment effects improved prediction ability in cases where training and test sets shared environments, but did not improve prediction in new untested environments. The environmental similarity between training and testing sets had a greater impact on the efficacy of prediction than genetic similarity between training and test sets. Technical Abstract: Technology advances have made possible the collection of a wealth of genomic, environmental, and phenotypic data available for use in plant breeding. Incorporation of environmental data into environment-specific genomic prediction (GP) is hindered in part because of inherently high data dimensionality. Computationally efficient approaches to combining genomic and environmental information may facilitate extension of GP models to new environments and germplasm, and better understanding of genotype-by-environment (GxE) interactions. Using genomic, yield trial, and environmental data on 1,918 unique hybrids evaluated in 59 environments from the maize Genomes to Fields project, we determined that a set of 10,187 SNP dominance coefficients, and a 5-day temporal window size for summarizing environmental variables were optimal for GP using only genetic and environmental main effects. Adding marker-by-environment variable interactions required dimension reduction, and we found that reducing dimensionality of the genetic data while keeping the full set of environmental covariates was best for environment-specific GP of grain yield, leading to an increase in prediction ability of 2.7% to achieve a prediction ability of 80% when data were masked at random. We then measured how prediction ability was affected under stratified training-testing sets to approximate scenarios commonly encountered by plant breeders, finding that incorporation of marker-by-environment effects improved prediction ability in cases where training and test sets shared environments, but did not improve prediction in new untested environments. The environmental similarity between training and testing sets had a greater impact on the efficacy of prediction than genetic similarity between training and test sets. |