Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #408671

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize

Author
item RAMSTEIN, GUILLAUME - Cornell University
item Buckler, Edward - Ed

Submitted to: Genome Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/15/2022
Publication Date: 9/1/2022
Citation: Ramstein, G.P., Buckler IV, E.S. 2022. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biology. 23. Article 183. https://doi.org/10.1186/s13059-022-02747-2.
DOI: https://doi.org/10.1186/s13059-022-02747-2

Interpretive Summary: In this study, we aimed to improve crop breeding and genome editing by identifying important genetic variations at a high level of detail. There are challenges with existing methods, which lack the required resolution. To overcome this, we used genomic annotations to predict nucleotide conservation across different plant species as an indicator of the impact of mutations on the plant's fitness. By analyzing genomic sequences, we identified mutations in maize genes and used bioinformatics and deep learning to make predictions. Predictions were validated using experimental data related to conservation within species, chromatin accessibility, and gene expression. The analysis highlights genes involved in central carbon metabolism as being particularly important for the plant's fitness and yield. Using this approach, we were able to significantly improve the prediction of fitness-related traits, such as grain yield, in hybrid maize varieties. By focusing on a small subset of genetic variants (less than 1%), they prioritized the most relevant sites likely to affect the key traits of maize yield. The results suggest that this method of predicting nucleotide conservation across different plant species could be valuable for selecting key genetic variations for accurate genomic prediction and identifying candidate mutations for efficient genome editing. The models and predicted nucleotide conservation data are available for public use in CyVerse.

Technical Abstract: Background Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. Results Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. Conclusions Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse