Location: Plant, Soil and Nutrition Research
2019 Annual Report
Objectives
Objective 1: Create approaches and tools for identifying causal variants directly from genomic sequencing of diverse germplasm and species of C4 crops. [NP301, C1, PS1A]
Objective 2: Identify deleterious mutations, and model their impact on crop efficiency and heterosis in C4 crops. [NP301, C3, PS3A]
Objective 3: Identify adaptive variants for drought and temperature tolerance across C4 crops. [NP301, C1, PS1B]
Objective 4: Establish community tools for processing and integration of sequence haplotypes to estimate their breeding effects in crop productivity. [NP301, C4, PS4A]
Approach
Increasing grass crop productivity is key for feeding the world over the next 50 years and this will require removing the deleterious variants in every genome, as well as adapting the crops to highly variable and stressful environments. This project will build better breeding models for improving and adapting maize and sorghum by surveying the natural variation across their entire group of wild relative species - the Andropogoneae. With over 1,000 species, the Andropogoneae are the most productive and water-use efficient plants in the world. Yet, for applied purposes, we have only tapped the variation from a handful of species. This project will lead an effort to survey DNA-level variation across this entire clade and analyze the variation with statistical and machine learning approaches. This will allow us to develop two sets of applied models for maize and sorghum. First, we will quantitatively estimate the deleterious impact on yield for every nucleotide in the genome. Second, we will identify the genes with a high capacity for adaptation to drought, flooding, temperature tolerance and their properties. These approaches and models will be deployed via integration with big data bioinformatics. This project will produce DNA-level knowledge that can be used across breeding programs and crops, and applied through either genomic selection or genome editing.
Progress Report
In order to measure the functional constraint of every nucleotide in the maize and sorghum genome, this project is comparing these species to the Andropogoneae tribe of over 1000 species. Over the last year, along with our collaborators we have been collecting samples, propagating, and beginning to sequence the genomes of these species. Novel sequencing and bioinformatic approaches have been evaluated to do a detailed genome analysis on 10 species and rough analysis on 8 species. Key advances were made in reducing the cost of long read sequencing, isolation of long DNA fragments from difficult species, assembling these DNA reads together, and metrics for assessing genome sequencing quality were developed. These efforts provide a rigorous set of approaches for DNA sequencing the rest of the tribe species in the coming years.
The DNA sequencing technologies to assemble full genomes have made tremendous progress in the last year, which are allowing entire genomes to be sequenced for 1/1000th of their previous cost. This project collaborated with several other global efforts to sequence and assemble three genomes, and lead the analysis of the community’s eighteen genomes into a practical haplotype graph. This represents a substantial portion of the temperate adapted field maize. Later in the year, an additional maize 26 genomes will be released by collaborators, and curated in publicly shared, which will provide access to tropical, sweet, and popcorn diversity. The graph has been used to identify functionally constrained regions of maize genomes, and used to support the U.S. Genomes To Field genotypic analyses.
The central dogma of molecular biology is that DNA sequence transcribed into RNA, which in turn is translated into proteins, do the work of the cell. By collecting billions of observations on maize DNA, RNA, and protein levels, researchers are applying the tools of machine learning to this space. We have made substantial progress in developing machine learning models to predict directly from DNA sequence – its structure, what proteins bind the DNA, whether a region of DNA will produce RNA, how much RNA it produces, and how much protein is likely to be present. Nearly 200 different models for separate processes have been created. These models are providing insight into how variation in DNA sequence produces changes in RNA and protein level, which subsequently affect field level variation.
This project leads a number of bioinformatic efforts to support the analysis of crop diversity. The TASSEL software tools, which have been a mainstay for plant trait and genotypic analysis, was enhanced with connections to the R analysis platform – R is pre-eminent statistical analysis environment. This connection improves TASSEL interconnectedness with other systems and should greatly expand its user base. Plant genomes are frequently extremely diverse, and graph rather than a linear representation is needed to capture this diversity. This project continued to develop the Practical Haplotype Graph (PHG) to deal with dozens of well assembled genomes, and continued to apply the graph to maize and sorghum breeding. Finally, this project has released a range of bioinformatic tools for machine learning using two approaches.
Most ARS specialty crop and animal breeders that run breeding programs do not have the scale to fully implement modern breeding technologies, practices, and tools, all of which could help them meet the increasing demands for new varieties that can better tolerate pests and diseases, changing weather patterns, and match consumer demands and preferences. The Breeding Insight (BI) program will bring integrated breeding software, rapid, efficient genotyping, dynamic real-time trait data collection, and a secure data management system to five ARS breeding programs (Alfalfa, Blueberry, Grape, Sweet potato, and Salmonid fishes) in the pilot phase of the project. In the first year, the accomplishments have been: hire most of the team, setup the facilities, established our key milestones and deliverables, established the foundations for the underlying data management systems, initiated development for genetic marker systems for all 5 species. This provides the foundation for all of USDA-ARS specialty breeding programs to begin leveraging the tools of modern genomics and informatics.
Accomplishments
1. Successful development of two methods for training machine learning models to help researchers. Both genomics and machine learning have advanced remarkably over the last several years, but the application of machine learning to modeling genomic data is frequently confounded by the strong evolutionary signatures in data, which prevents the development of accurate mechanistic models. These models are needed to identify genetic variation that is likely to be functional and could be used to improve varieties either through genomic selection or editing. ARS researchers in Ithaca, New York, along with collaborators have developed two methods for training machine learning models without being confounded by evolution, and successfully applied these approaches to the prediction of gene RNA expression. These strategies can be applied to any species and a wide range of genomic problems, which should allow research to quickly discover the functional mechanisms and the underlying variants responsible for them.
Review Publications
Wallace, J.G., Rodgers-Melnick, E., Buckler IV, E.S. 2018. On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annual Review of Genetics. 52(1)421-444. https://doi.org/10.1146/annurev-genet-120116-024846.
Yang, J., Mezmouk, S., Baumgarten, A., Buckler IV, E.S., Guill, K.E., McMullen, M., Mumm, R., Ross-Ibarra, J. 2017. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genetics. https://doi.org/10.1371/journal.pgen.1007019.
Zhang, D., Easterling, K., Pitra, N., Coles, M., Buckler IV, E.S., Bass, H., Matthews, P. 2017. Non-mendelian single-nucleotide polymorphism inheritance and atypical meiotic configurations are prevalent in hop. The Plant Genome. 10(3). https://doi.org/10.3835/plantgenome2017.04.0032.
Dos Santos, J.P., Fernandes, S.B., Lozano, R., Brown, P.K., Buckler IV, E.S., Garcia, A.A., Gore, M.A. 2019. Novel bayesian networks for genomic prediction of developmental traits in biomass sorghum. bioRxiv. https://doi.org/10.1101/677179.
Punnuri, S.M., Wallace, J.G., Knoll, J.E., Hyma, K.E., Mitchell, S.E., Buckler IV, E.S., Varshney, R.K., Singh, B.P. 2016. Development of a high-density linkage map and tagging leaf spot resistance in pearl millet uysing genotyping-by-sequencing markers. The Plant Genome. 9(2):1-13.
Li, B., Kremling, K., Wu, P., Bukowski, R., Romay, M., Xie, E., Buckler IV, E.S., Chen, M. 2018. Co-regulation of ribosomal RNA with hundreds of genes contributes to phenotypic variations. Genome Research. https://doi.org/10.1101/gr.229716.117.
Wang, J., Zhou, Z., Li, H., Liu, D., Zhang, Q., Bradbury, P., Buckler IV, E.S., Zhang, Z. 2018. Expanding the BLUP alphabet for genomic prediction adaptable to the genetic architectures of complex traits. Heredity. https://doi.org/10.1038/s41437-018-0075-0.
Li, Y., Chen, L., Bradbury, P., Shi, Y., Song, Y., Zhang, D., Zhang, Z., Buckler IV, E.S., Li, Y., Wang, T. 2018. Increased experimental conditions and marker densities identified more genetic loci associated with southern and northern leaf blight resistance in maize. Nature Scientific Reports. (8):6848. https://doi.org/10.1038/s41598-018-25304-z.
He, Y., Wang, M., Dukowic-Schulze, S., Zhou, A., Tiang, C., Shilo, S., Sidh Sidhu, G., Eichten, S., Bradbury, P., Springer, N., Buckler IV, E.S., Levy, A., Sun, Q., Pillardy, J., Kianian, P., Kianian, S., Chen, C., Pawlowski, W. 2017. Genomic features shaping the landscape of meiotic double-strand break hotspots in maize. Proceedings of the National Academy of Sciences. 114(46):12231-12236.
Liu, Z., Cook, J., Melia-Hancock, S., Guill, K.E., Bottoms, C., Garcia, A., Ott, O., Nelson, R., Reckerd, J., Balint Kurti, P.J., Larsson, S., Lepak, N.K., Buckler IV, E.S., Trimble, L., Tracy, W., McMullen, M.D., Flint Garcia, S.A. 2016. Expanding maize genetic resources with predomestication alleles: Maize–teosinte introgression populations. The Plant Genome. (9):1.
Walters, W.A., Jin, Z., Youngblut, N., Wallace, J.G., Sutter, J., Zhang, W., González-Peña, A., Peiffer, J., Koren, O., Shi, Q., Knight, R., Glavina Del Rio, T., Tringe, S.G., Buckler IV, E.S., Dangl, J.L., Ley, R.E. 2018. Large-scale replicated field study of maize rhizosphere identifies heritable microbes. Proceedings of the National Academy of Sciences. 115(28):7368-7373.
Diepenbrock, C., Kandianis, C., Lipka, A., Magallanes-Lundback, M., Vaillancourt, B., Gongora-Castillo, E., Wallace, J., Cepela, J., Mesberg, A., Bradbury, P., Ilut, D., Mateos-Hernandez, M., Hamilton, J., Owens, B., Tiede, T., Buckler IV, E.S., Rocheford, T., Buell, R., Gore, M., Dellapenna, D. 2017. Novel loci underlie natural variation in vitamin E levels in maize grain. The Plant Cell. 29(10):2374-2392. DOI: https://doi.org/10.1105/tpc.17.00475
Varshney, R., Shi, C., Thudi, M., Mariac, C., Wallace, J., Qi, P., Zhang, H., Zhao, Y., Wang, X., Rathore, A., Srivastava, R., Chitikineni, A., Fan, G., Bajaj, P., Punnuri, S., Gupta, S., Wang, H., Jiang, Y., Couderc, M., Katta, M., Paudel, D., Mungra, K., Chen, W., Harris-Shultz, K.R., Garg, V., Desai, N., Doddamani, D., Kane, N., Conner, J., Ghatak, A., Chaturvedi, P., Subramaniam, S., Yadav, O., Berthouly-Salazar, C., Hamidou, F., Wang, J., Liang, X., Clotault, J., Upadhyaya, H., Cubry, P., Rhoné, B., Gueye, M., Sunkar, R., Dupuy, C., Sparvoli, F., Cheng, S., Mahala, R., Singh, B., Yadav, R., Lyons, E., Datta, S., Hash, C., Devos, K., Buckler IV, E.S., Bennetzen, J., Paterson, A.H., Ozias-Akins, P., Grando, S., Wang, J., Mohapatra, T., Weckwerth, W., Reif, J.C., Liu, X., Vigouroux, Y., Xu, X. 2017. Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nature Communications. 35(10):969.