Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434435

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

2021 Annual Report


Objectives
Objective 1: Create approaches and tools for identifying causal variants directly from genomic sequencing of diverse germplasm and species of C4 crops. [NP301, C1, PS1A] Objective 2: Identify deleterious mutations, and model their impact on crop efficiency and heterosis in C4 crops. [NP301, C3, PS3A] Objective 3: Identify adaptive variants for drought and temperature tolerance across C4 crops. [NP301, C1, PS1B] Objective 4: Establish community tools for processing and integration of sequence haplotypes to estimate their breeding effects in crop productivity. [NP301, C4, PS4A]


Approach
Increasing grass crop productivity is key for feeding the world over the next 50 years and this will require removing the deleterious variants in every genome, as well as adapting the crops to highly variable and stressful environments. This project will build better breeding models for improving and adapting maize and sorghum by surveying the natural variation across their entire group of wild relative species - the Andropogoneae. With over 1,000 species, the Andropogoneae are the most productive and water-use efficient plants in the world. Yet, for applied purposes, we have only tapped the variation from a handful of species. This project will lead an effort to survey DNA-level variation across this entire clade and analyze the variation with statistical and machine learning approaches. This will allow us to develop two sets of applied models for maize and sorghum. First, we will quantitatively estimate the deleterious impact on yield for every nucleotide in the genome. Second, we will identify the genes with a high capacity for adaptation to drought, flooding, temperature tolerance and their properties. These approaches and models will be deployed via integration with big data bioinformatics. This project will produce DNA-level knowledge that can be used across breeding programs and crops, and applied through either genomic selection or genome editing.


Progress Report
This year our genomic efforts were focused on sequencing wild species in the maize and sorghum clade – the Andropogoneae. Working with USDA collaborators (Stoneville, Mississippi) and others, 24 species have been assembled to a high quality, and an additional 21 species are in process. We have also sampled and sequenced 49 diverse maize inbred lines, selected because of their high diversity and/or importance in the Genomes to Fields and Germplasm Enhanced of Maize (GEM) projects. While the current generation of long read DNA sequencers is extremely powerful, there is still tremendous variation in output from different samples and different runs. The variation in output is caused by carryover contaminants from long DNA preps (plants have a wide range of secondary metabolites) and inconsistencies in DNA sequencer technology and reagents. Finally, we have performed short read DNA sequencing on 350 Andropogoneae species from herbarium specimens and other field collections. Along with these massive genomic datasets come two challenges – dealing with variation in quality and alignment. In terms of quality, these massive genomic datasets, while expansive, have biases and genomic regions that are not fully sequenced. These inconsistencies have pushed us to developed bioinformatic pipelines that scaffold entire genomes accurately using pangenomes as a reference. This pipeline allows us to upgrade both our assemblies and other assemblies to higher quality. As for alignment, with nearly 500 genomes of maize and related species available, sequence alignment is one of the most important steps to make inferences from this data. However, it is still extremely challenging to align and compare the large intergenic regions of two separate species. By combining a new dynamic programming algorithm with whole genome alignment, we have created the most sensitive whole genome aligner, which is providing critical insights into the evolution and function of regulatory regions of the genome. How an organism grows and performs is substantially the product of the level and timing of gene expression and the activity of each protein. In three studies this year, we showed that RNA expression, protein translation, and protein structure modeling resulted in substantial improvements in genomic prediction of field performance. First, we used a novel approach to statistically estimate RNA expression variation caused by genetic variants near the gene, and then used these estimates to predict field traits in other populations. Notably, the genetic variation near genes (cis-variation) produces consistent RNA expression effects across tissues and conditions. With these estimated cis-RNA effects, we saw substantial improvements in genomic prediction across all traits, suggesting that modeling RNA expression for all genes is key to accurate genomic prediction. Second, we studied how natural variation impacts translation of RNA to protein – and found that upstream open reading frames had both the most deleterious mutations and were key for adaptation. Third, our modeling of deleterious mutations has previously focused on the conservation of amino acids. This year, we combined machine learning models to predict deleterious impact using both conservation and structure, and we saw substantial improvements in the prediction accuracy of hybrid vigor and yield. In seven papers this year, we evaluated how maize, sorghum, and other plants adapt to their local environments. First, using a novel statistical approach that compares population differentiation for RNA expression variation to DNA variation, we showed that RNA changes drive substantial local adaptation. Second, we demonstrated that the adaptation is frequently controlled at conserved non-coding regions which are shared among related grasses. As our genomic sequencing expands to hundreds of Andropogoneae genomes, this approach will become extremely powerful in helping us understand adaptive variation. Third, in two studies lead by collaborators, we saw the impact of RNA expression on adaptation to drought in both maize and sorghum. Finally, in three studies, we have evaluated how evolution adapts proteins to various environmental temperatures. Our strategy was to develop models at the single residue level in microbes and then apply those models to homologous proteins in plants. A third of the amino acids in each protein are temperature sensitive. These microbially developed models correctly predicted how an essential maize gene in phospholipid metabolism helps provide maize adaptation to various temperature regimes. In a comparison of maize, Arabidopsis, and poplar, we also saw evidence for temperature adaptation among different plant organs (e.g., roots versus leaves), among different organelles, and based on the environmental history of each species. Quantitative genetics over the last two decades have focused on either on individual or combined impact of genetic variants on phenotype. However, we know there are lots of syngistic and non-linear effects between genetic variants at single gene. Now that we can fully reconstruct genomes, we are tackling the bioinformatics to model entire gene haplotypes. Our system for modeling haplotypes across a species – the Practical Haplotype Graph (PHG) - has been substantially enhanced to work with massive datasets now typified by maize. It also now supports integration with the public Breeding API standards and works with the R computing environment. We have helped develop PHGs for maize, wheat, sorghum, and cassava. Our TASSEL software package continues to be among the most popular tools for analysis of functional diversity, and we have begun to develop and the public release of TASSEL 6, which is starting the paradigm shift from focusing on large number of genetic variants to focusing on how these variants combine to create different functional haplotypes. Breeding Insight (BI) is the ARS initiative to increase the adoption of genomics, phenomics, and analytics tools (including data management software) in ARS specialty crop and animal breeding programs, which have lagged behind major crop and animal breeding programs. BI is currently in year three of a pilot phase focused on building support services for six ARS breeding programs (blueberry, table grape, sweet potato, alfalfa, rainbow trout, and North American Atlantic salmon), with the future goal of expansion out to all ARS specialty crops, animal, and natural resource breeding programs. As COVID restricted travel, BI focused on 1) training and support for Field Book deployment in blueberry, alfalfa, and sweetpotato for the 2021 field season, 2) validation of custom 3K marker panels for blueberry, alfalfa, and Atlantic salmon, 3) genome sequencing on ARS sweetpotato for marker creation, and 4) loading of historical breeding program data for grape, sweetpotato, and both salmonids into their own BreedBase instances. Other accomplishments of smaller note include creation of a genotypic analyses pipeline to assist with marker-assisted selection in grape (though the solution is flexible enough to be expanded to other species), the initiation of a sweetpotato weevil sequencing project to allow the breeder to identify wild populations of this endemic pest in his fields, and a completely revamped website to better serve BI stakeholders. BI's second significant software development accomplishment is the release of two open-source applications that allow the public to test and use BI code in a user-friendly interface. These “sandbox sites” are critical for both testing new code and for refining the interface to better suit the wide variety of breeders that BI services. The software team has made major improvements the back-end communications between Field Book and BreedBase through the Breeding API (BrAPI) connection. The difficulties experienced by BI staff when importing historical data into BreedBase (while remaining BrAPI compliant) prompted the IT team create a better and more flexible import/export solution for breeders. Working prototypes of this import tool are under refinement at BI and will be used by BI coordinators to hasten loading of any type data into BreedBase. As with all BI's software, it will be BrAPI 2.0 compliant, open-source, and publicly available. The IT team at the Breeding Management System has already expressed a desire to integrate this tool into their software stack. In year 3, BI completed hiring for all the roles detailed in the proposal. These new hirings included a new Breeding Coordinator, a Communication and Training Lead, a Phenomics Coordinator, a Software Q/A Specialist and two new Application Programmers. BI also hired a Product Owner to guide and prioritize software development to align with breeders’ need and complete BI minimum viable product (MVP).


Accomplishments
1. Gene level modeling of RNA production improves prediction of field performance. Maize has 37,000 genes that interact together to grow and respond to the environment, thus one of the key goals of breeding is to be able predict these interactions from just the DNA. Using over 70 million measurements of how much RNA each of these genes produce under various conditions and novel statistical approaches, ARS researchers in Ithaca, New York, (along with collaborators) have substantially improved our ability to predict how thousands of varieties of maize will grow for over two dozen traits. Importantly, this approach could be developed for many other crops using the powerful genomic tools available. Long-term this will allow advanced genomic models to be applied to all crops.

2. Breeding Insight deploys field and genomic tools for five of six specialty crop and animal species. While specialty crops and animals are a large portion of gross U.S. agricultural revenue, individually these small programs have not had access to the data capture and genomic innovations that benefit major crop and animal breeding programs and, thus, have lagged behind. The challenge is in both constructing the genomic resources (data) and in integrating and processing the billions of genomic and field data points needed to make informed decisions. This year, Breeding Insight generated genomic resources for breeding of blueberry, alfalfa, sweetpotato, sweetpotato weevil, and North American Atlantic salmon. Additionally, BI developed databases and field data collection systems for each of these species. Putting this powerful information and these genomic tools into the hands of ARS’s excellent specialty crop and animal breeders helps to improve breeding decisions and to meet public demand for more nutritious and flavorful foods.


Review Publications
Lozano, R., Gazave, E., Dos Santos, J., Valluru, R., Bandillo, N., Fernandes, S., Brown, P.J., Shakoor, N., Mockler, T., Ross-Ibarra, J., Buckler IV, E.S., Gore, M.A. 2021. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nature Plants. (7):17-24. https://doi.org/10.1038/s41477-020-00834-5.
Cimen, E., Jensen, S., Buckler IV, E.S. 2020. Building a tRNA thermometer to estimate microbial adaptation to temperature. Nucleic Acids Research. 48(21):12004-12045. https://doi.org/10.1093/nar/gkaa1030.
Rogers, A.R., Dunne, J.C., Romay, C., Bohn, M., Buckler IV, E.S., Ciampitti, I.C., Edwards, J.W., Ertl, D., Flint Garcia, S.A., Gore, M.A., Graham, C., Hirsch, C.N., Hood, E., Hooker, D.C., Knoll, J.E., Lee, E.C., Lorenz, A., Lynch, J.P., Mckay, J., Moose, S.P., Murray, S.C., Nelson, R., Rocheford, T., Schnable, J.C., Schnable, P.S., Sekhon, R., Singh, M., Smith, M., Springer, N., Thelen, K., Thomison, P., Thompson, A., Tuinstra, M., Wallace, J., Wisser, R.J., Xu, W., Gilmour, A., Kaeppler, S.M., Deleon, N., Holland, J.B. 2021. The importance of dominance and genotype-by-environment interactions on grain yield variation in a large-scale public cooperative maize experiment. Genes, Genomes, Genetics. https://doi.org/10.1093/g3journal/jkaa050.
Chen, S., Mei-Hsiu, S., Kremling, K.A., Lepak, N.K., Romay, M.C., Sun, Q., Bradbury, P., Buckler IV, E.S., Ku, H. 2020. Identification of miRNA-eQTLs in maize mature leaf by GWAS. Biomed Central (BMC) Genomics. 21(689). https://doi.org/10.1186/s12864-020-07073-0.
Ding, Y., Weckwerth, P.R., Poretsky, E., Murphy, K.M., Sims, J., Saldivar, E., Christensen, S.A., Char, S., Yang, B., Tong, A., Shen, Z., Kremling, K.A., Buckler IV, E.S., Kono, T., Nelson, D.R., Bohlmann, J., Bakker, M.G., Vaughan, M.M., Khalil, A.S., Betsiashvili, M., Briggs, S.P., Zerbe, P., Schmelz, E.A., Huffaker, A. 2020. Genetic elucidation of interconnected antibiotic pathways mediating maize innate immunity. Nature Plants. (6):1375-1388. https://doi.org/10.1038/s41477-020-00787-9.
Tu, X., Majia-Guerra, M., Valdes Franco, J.A., Tzeng, D., Chu, P., Shen, W., Wei, Y., Dai, X., Li, P., Buckler IV, E.S., Zhong, S. 2020. Reconstructing the maize leaf regulatory network using ChIP-seq data of 104 transcription factors. Nature Communications. (11):5089. https://doi.org/10.1038/s41467-020-18832-8.
Blanc, J., Kremling, K., Buckler IV, E.S., Josephs, E. 2021. Local adaptation contributes to gene expression divergence in maize. Genes, Genomes, Genetics. 11(2):jkab004. https://doi.org/10.1093/g3journal/jkab004.
Jores, T., Tonnies, J., Wrightsman, T., Buckler IV, E.S., Cuperus, J.T., Fields, S., Queitsch, C. 2021. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nature Plants. 7:842-855. https://doi.org/10.1038/s41477-021-00932-y.
Jarquin, D., De Leon, N., Romay, M., Bohn, M., Buckler IV, E.S., Ciampitti, I., Edwards, J.W., Ertl, D., Flint Garcia, S.A., Gore, M.A., Graham, C., Hirsch, C.N., Holland, J.B., Hooker, D., Kaeppler, S.M., Knoll, J.E., Lee, E.S., Lawrence-Dill, C.J., Lynch, J.P., Moose, S.P., Murray, S.C., Nelson, R., Rocheford, T., Schnable, J.C., Schnable, P.S., Smith, M., Springer, N., Thomison, P., Tuinstra, M., Wisser, R.J., Xu, W., Lorenz, A. 2021. Utility of climatic information via combining ability models to improve genomic prediction for yield within the genomes to fields maize project. Frontiers in Genetics. 11:592769. https://doi.org/10.3389/fgene.2020.592769.
Rogers, A.R., Dunne, J.C., Romay, M.C., Bohn, M., Buckler IV, E.S., Ciampitti, I.C., Edwards, J.W., Ertl, D., Flint Garcia, S.A., Gore, M.A., Graham, C., Hirsch, C.N., Hood, E.C., Hooker, D., Knoll, J.E., Lee, E.C., Lorenz, A., Lynch, J.P., Mckay, J., Moose, S.P., Murray, S.C., Nelson, R., Rocheford, T., Schnable, J.C., Schnable, P.S., Sekhon, R., Singh, M., Smith, M., Springer, N., Thelen, K., Thomison, P., Thompson, A., Tuinstra, M., Wallace, J., Wisser, R., Xu, W., Gilmour, A., Kaeppler, S.M., Deleon, N., Holland, J.B. 2021. The importance of dominance and genotype-by-environment interactions on grain yield variation in a large-scale public cooperative maize experiment. Genes, Genomes, Genetics. 11(2):jkaa050. https://doi.org/10.1093/g3journal/jkaa050.
Dos Santos, J.P., Fernandes, S.B., Mccoy, S., Lozano, R., Brown, P.J., Leakey, A.D., Buckler IV, E.S., Garcia, A.A., Gore, M.A. 2020. Novel bayesian networks for genomic prediction of developmental traits in biomass sorghum. Genes, Genomes, Genetics. 10(2):769-781. https://doi.org/10.1534/g3.119.400759.
Jordan, K., Bradbury, P., Miller, Z., Nyine, M., He, F., Guttieri, M.J., Brown Guedira, G.L., Buckler Iv, E.S., Jannink, J., Akhunov, E., Ward, B.P., Bai, G., Bowden, R.L., Fiedler, J.D., Faris, J.D. 2021. Development of the Wheat Practical Haplotype Graph Database as a Resource for Genotyping Data Storage and Genotype Imputation. G3 Genes/Genomes/Genetics. https://doi.org/10.1101/2021.06.10.447944.
Swarts, K., Bauer, E., Glaubitz, J.C., Ho, T., Johnson, L., Li, Y., Li, Y., Miller, Z., Schon, C., Wang, T., Zhang, Z., Buckler Iv, E.S., Bradbury, P. 2021. Joint analysis of days to flowering reveals independent temperate adaptations in maize. Heredity. 126:929-941. https://doi.org/10.1038/s41437-021-00422-z.
Wang, L., Huang, Y., Liu, Z., He, J., Jiang, X., He, F., Lu, Z., Yang, S., Chen, P., Yu, H., Zeng, B., Ke, L., Xie, Z., Larkin, R., Jiang, D., Ming, R., Buckler IV, E.S., Xu, Q. 2021. Somatic variations led to the selection of acidic and acidless orange cultivars. Nature Plants. https://doi.org/10.1038/s41477-021-00941-x.
Diepenbrock, C.H., Ilut, D.C., Magallanes-Lundback, M., Kandianis, C.B., Lipka, A.E., Bradbury, P., Holland, J.B., Hamilton, J.P., Wooldridge, E., Vaillancourt, B., Góngora-Castillo, E., Wallace, J.G., Cepela, J., Mateos-Hernandez, M., Owens, B.F., Tiede, T., Buckler IV, E.S., Rocheford, T., Buell, C., Gore, M.A., Dellapenna, D. 2021. Eleven biosynthetic genes explain the majority of natural variation in carotenoid levels in maize grain. The Plant Cell. 33(4):882–900. https://doi.org/10.1093/plcell/koab032.
Jensen, S., Charles, J., Muleta, K., Bradbury, P., Casstevens, T., Deshpande, S.P., Gore, M.A., Gupta, R., Johnson, L., Lozano, R., Miller, Z., Ramu, P., Rathore, A., Upadhyaya, H.D., Varshney, R., Morris, G.P., Pressoir, G., Buckler IV, E.S., Ramstein, G. 2020. A sorghum practical haplotype graph facilitates genome-wide imputation and cost effective genomic prediction. The Plant Genome. 13(1). Article e20009. https://doi.org/10.1002/tpg2.20009.
Wu, X., Feng, H., Wu, D., Yan, S., Zhang, P., Wang, W., Zhang, J., Ye, J., Dai, G., Fan, Y., Li, W., Song, B., Geng, Z., Yang, W., Chen, G., Qin, F., Terzaghi, W., Stitzer, M., Li, L., Xiong, L., Yan, J., Buckler IV, E.S., Dai, M. 2021. Using high-throughput multiple optical phenotyping to decipher the genetic architecture of maize drought tolerance. Genome Biology. 22(185):1-26. https://doi.org/10.1186/s13059-021-02377-0.
Song, B., Buckler IV, E.S., Wang, H., Wu, Y., Rees, E., Kellogg, E.A., Gates, D.J., Khaipho-Burch, M., Bradbury, P., Ross-Ibarra, J., Hufford, M.B., Romay, M. 2021. Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize. Genome Research. 31:1245-1257. https://doi.org/10.1101/gr.266528.120.