Project : USDA ARS

ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #434521

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

2020 Annual Report

Objectives
Objective 1: Accelerate maize trait analysis, germplasm analysis, genetic studies, and breeding through stewardship of maize genomes, genetic data, genotype data, and phenotype data. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in maize germplasm. Objective 3: Identify and curate key datasets for benchmarking genomic discovery tools for the functional annotation of maize genomes, for agronomic trait analyses, for breeding (including genome editing), and for improving database interoperability. Objective 4: Provide community support services, training and documentation, meeting coordination, support for community elections and surveys, and support for the crop genome database community. Objective 5: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability.

Approach
The Maize Genetics and Genomics Database (MaizeGDB – http://www.maizegdb.org) is the model organism database for maize. MaizeGDB’s overall aim is to provide long-term storage, support, and stability to the maize research community’s data and to provide informatics services for access, integration, visualization, and knowledge discovery. The MaizeGDB website, database, and underlying resources allow plant researchers to understand basic plant biology, make genetic enhancement, facilitate breeding efforts, and translate those findings into products that increase crop quality and production. To accelerate research and breeding progress, generated data must be made freely and easily accessible. Curation of high-quality and high-impact datasets has been the foundation of the MaizeGDB project since its inception over 25 years ago. MaizeGDB serves as a two-way conduit for getting maize research data to and from our stakeholders. The maize research community uses data at MaizeGDB to facilitate their research, and in return, their published data gets curated at MaizeGDB. The information and data provided at MaizeGDB and facilitated through outreach has directly been used in research that has had broad commercial, social, and academic impacts. The MaizeGDB team will make accessible high-quality, actively curated and reliable genetic, genomic, and phenotypic description datasets. At the root of high-quality genome annotation lies well-supported assemblies and annotations. For this reason, we focus our efforts on benefitting researchers by developing a system to ensure long-term stewardship of both a representative reference genome sequence assembly with associated structural and functional annotations as well as additional reference-quality genomes that help represent the diversity of maize. In addition, we will enable researchers to access data in a customized and flexible manner by deploying tools that enable direct interaction with the MaizeGDB database. Continued efforts to engage in education, outreach, and organizational needs of the maize research community will involve the creation and deployment of video and one-on-one tutorials, updating maize Cooperators on developments of interest to the community, and supporting the information technology needs of the Maize Genetics Executive Committee and Annual Maize Genetics Conference Steering Committee.

Progress Report
ARS scientists working on the Maize Genetics and Genomics Database (MaizeGDB) in Ames, Iowa, provide tools and resources that make the maize genome sequence useful for investigative research and crop improvement. The objectives of the MaizeGDB team are to provide stewardship to key datasets related to maize genetics, genomics, and breeding, develop robust infrastructure to store, query, integrate, and visualize data, curate high-quality, high-impact datasets, interact with the maize research community to identify needs and priorities, and to work with other outside communities and databases to coordinate on data standards and interoperability. MaizeGDB’s stewardship efforts have focused on high-quality genome sequences. MaizeGDB has worked closely with the University of Georgia, Cold Spring Harbor Laboratory, and Iowa State University to steward 26 high-quality, diverse maize genomes and expression datasets. MaizeGDB hosts 44 reference-quality maize genome assemblies and data for thousands of other public maize lines, representing the broad diversity of maize genomic and genetic data. MaizeGDB has ongoing efforts to curate datasets (more than 1,000 datasets to date) that have been associated with functional regions in the genome, which can be visually explored through a genome browser tool. MaizeGDB now supports over 40 genome browsers for recently released maize genomes and 134 datasets that can be used as targets for sequence similarity searches. Data is being generated and collected to represent how genes and proteins are expressed in different conditions and across multiple genome assemblies. This allows researchers to leverage research outcomes from many different sources and data types, all within the context of the maize reference assemblies. A gene expression viewer helps researchers determine how genes are regulated and which genes to select for targeted crop improvement. MaizeGDB continues to identify key datasets to include in a tool co-developed with the University of Missouri which enables researchers to integrate their data with publicly available data. This tool makes it easy for researchers to query and analyze data from many different studies. MaizeGDB has updated its infrastructure to allow the capability to host community-developed applications. These applications have easy-to-use interfaces that provide access to data, analysis, and visualization. The applications include a tool, built in collaboration with Iowa State University, to evaluate genome assemblies with a common set of metrics and evaluation techniques, visualize blocks of shared DNA similarity across maize varieties, and a pathway association study tool (in collaboration with ARS researchers from Starkville, Mississippi) that assigns changes in DNA to genes and metabolic pathways. Additional curation efforts have targeted datasets and tools to support maize trait analysis, genetic studies, and breeding. MaizeGDB has coordinated with a consortium of over 25 agricultural biological databases to develop best practices to make sure data adheres to community-defined standards. MaizeGDB continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community. Work carried out by the MaizeGDB team has resulted in improved communication among maize researchers worldwide, increased ability to document the results of experiments, and increased availability of information relative to high-impact research.

Accomplishments
1. The Maize Genetics and Genomics Database (MaizeGDB) released data and resources for a set of 26 diverse maize genome assemblies. Over the past decade, maize researchers have relied on a single reference genome as the genomic representation of maize. However, maize genetics, genomics, and breeding research depends upon the diversity within maize for basic research and to improve agriculturally important traits, and that diversity is not adequately represented by a single reference genome. To make multi-genomics data and resources available to maize researchers, ARS researchers in Ames, Iowa, worked closely with the University of Georgia, Cold Spring Harbor Laboratory, and Iowa State University to offer 26 high-quality, diverse maize genomes and supporting data sets through MaizeGDB, the genetics and genomics database for the maize research community. These reference genomes are notable for their completeness (low number of gaps), accuracy (low number of errors), and a high percentage of sequences assembled into chromosomes. The methods used to sequence and assemble the genomes are described in two papers from Nature Communications and Genome Biology. This release includes 26 genome pages, over one million gene pages, 206 downloadable data sets, and 134 sets of sequences that can be searched based on homology. In addition, MaizeGDB upgraded to next-generation genome browsers allowing researchers to easily navigate within and across this set of genomes and visualize over 1,000 data sets corresponding to functional regions within the maize genome. The data includes over 200 agriculturally important traits associated with tens of thousands of locations across the 10 maize chromosomes. A tool used to assess the quality of the genomes and annotations is published in the scientific journal BMC Genomics and is available for public use through MaizeGDB. These new resources will lead to improved crop performance by helping researchers better understand the relationship between the genes in a plant and the traits observed in farmers’ fields.

Review Publications
Coffman, S.M., Hufford, M.B., Andorf, C.M., Lubberstedt, T. 2019. Haplotype structure in commercial maize breeding programs in relation to key founder lines. Theoretical and Applied Genetics. 133:547-561. https://doi.org/10.1007/s00122-019-03486-y.
Manchanda, N., Portwood II, J.L., Woodhouse, M.H., Seetharam, A., Lawrence-Dill, C.J., Andorf, C.M., Hufford, M. 2020. GenomeQC: A quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics. 21. https://doi.org/10.1186/s12864-020-6568-2.
Ou, S., Liu, J., Chougule, K., Fungtammasan, A., Seetharam, A., Stein, J., Llaca, V., Manchanda, N., Gilbert, A., Wei, S., Ware, D., Woodhouse, M.H., et all. 2020. Effect of sequence depth and length in long-read assembly of the maize inbred NC358. Nature Communications. 11. https://doi.org/10.1038/s41467-020-16037-7.
Walsh, J.R., Woodhouse, M.R., Andorf, C.M., Sen, T.Z. 2020. Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize. Biomed Central (BMC) Plant Biology. 20. https://doi.org/10.1186/s12870-019-2218-8.
Liu, J., Seetharam, A.S., Chougule, K.M., Ou, S., Swentowsky, K.W., Gent, J.I., Llaca, V., Woodhouse, M.H., Manchanda, N., Presting, G.G., Kurdna, D.A., Alabady, M., Hirsch, C.N., Fengler, K.A., Ware, D., Michael, T.P., Hufford, M.B., Dawe, R.K. 2020. Gapless assembly of maize chromosomes using long-read technologies. Genome Biology. 21. https://doi.org/10.1186/s13059-020-02029-9.

U.S. DEPARTMENT OF AGRICULTURE

Corn Insects and Crop Genetics Research: Ames, IA