Project : USDA ARS

ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #434521

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

2022 Annual Report

Objectives
Objective 1: Accelerate maize trait analysis, germplasm analysis, genetic studies, and breeding through stewardship of maize genomes, genetic data, genotype data, and phenotype data. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in maize germplasm. Objective 3: Identify and curate key datasets for benchmarking genomic discovery tools for the functional annotation of maize genomes, for agronomic trait analyses, for breeding (including genome editing), and for improving database interoperability. Objective 4: Provide community support services, training and documentation, meeting coordination, support for community elections and surveys, and support for the crop genome database community. Objective 5: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability.

Approach
The Maize Genetics and Genomics Database (MaizeGDB – http://www.maizegdb.org) is the model organism database for maize. MaizeGDB’s overall aim is to provide long-term storage, support, and stability to the maize research community’s data and to provide informatics services for access, integration, visualization, and knowledge discovery. The MaizeGDB website, database, and underlying resources allow plant researchers to understand basic plant biology, make genetic enhancement, facilitate breeding efforts, and translate those findings into products that increase crop quality and production. To accelerate research and breeding progress, generated data must be made freely and easily accessible. Curation of high-quality and high-impact datasets has been the foundation of the MaizeGDB project since its inception over 25 years ago. MaizeGDB serves as a two-way conduit for getting maize research data to and from our stakeholders. The maize research community uses data at MaizeGDB to facilitate their research, and in return, their published data gets curated at MaizeGDB. The information and data provided at MaizeGDB and facilitated through outreach has directly been used in research that has had broad commercial, social, and academic impacts. The MaizeGDB team will make accessible high-quality, actively curated and reliable genetic, genomic, and phenotypic description datasets. At the root of high-quality genome annotation lies well-supported assemblies and annotations. For this reason, we focus our efforts on benefitting researchers by developing a system to ensure long-term stewardship of both a representative reference genome sequence assembly with associated structural and functional annotations as well as additional reference-quality genomes that help represent the diversity of maize. In addition, we will enable researchers to access data in a customized and flexible manner by deploying tools that enable direct interaction with the MaizeGDB database. Continued efforts to engage in education, outreach, and organizational needs of the maize research community will involve the creation and deployment of video and one-on-one tutorials, updating maize Cooperators on developments of interest to the community, and supporting the information technology needs of the Maize Genetics Executive Committee and Annual Maize Genetics Conference Steering Committee.

Progress Report
The project is responsible for developing the Maize Genetics and Genomics Database (MaizeGDB), where tools and resources make the maize genome sequence useful for investigative research and crop improvement. The objectives of the MaizeGDB team are to provide stewardship to key datasets related to maize genetics, genomics, and breeding, develop robust infrastructure to store, query, integrate, and visualize data, curate high-quality, high-impact datasets, interact with the maize research community to identify needs and priorities, and to work with other outside communities and databases to coordinate on data standards and interoperability. MaizeGDB’s stewardship efforts have focused on using a pan-genomic approach to host and annotate high-quality genome sequences. MaizeGDB now hosts a maize pan-genome based on 50 reference-quality maize genome assemblies for over 40 unique cultivars and data from thousands of other public maize lines, representing the broad diversity of maize genomic and genetic data (Objective 1). MaizeGDB has ongoing efforts to curate datasets (more than 1,000 datasets to date) that have been associated with functional regions in the genome, which can be visually explored through a set of genome browser tools (Objectives 1-3). MaizeGDB now supports over 40 genome browsers for recently released maize genomes and over 140 datasets that can be used as targets for sequence similarity searches (Objective 2). Researchers can use these data to compare genomes with different phenotypes for a particular trait of interest and identify genome changes associated with phenotypic differences (Objectives 2 & 3). Data from four forward-genetics resources have been remapped to the latest genome assemblies which captured hundreds of improved loci for insertion mutants. Data is generated and collected to represent how genes and proteins are expressed in different conditions and across multiple genome assemblies. This allows researchers to leverage research outcomes from many various sources and data types, all within the context of the maize reference assemblies. Three-dimensional protein structure images (predicted by AlphaFold) are available at MaizeGDB for over 39,000 maize proteins (Objectives 2 & 3). In collaboration with Iowa State University (NACA: 5030-21000-068-09S), a gene expression viewer is available for 26 maize genomes with over 750 gene expression experiments (Objective 2). This tool helps researchers determine how genes are regulated and which genes to select for targeted crop improvement. MaizeGDB continues to identify key datasets to include in a tool co-developed with the University of Missouri (NACA: 5030-21000-068-011S) which enables researchers to integrate their data with publicly available data (Objectives 2 & 3). This tool makes it easy for researchers to query and analyze data from many different studies. MaizeGDB continuously improves its methodology for curating, hosting, and integrating large sets of genome assemblies (Objective 1). MaizeGDB has enhanced tools for annotating gene structures, visualizing structural variation, storing gene-based features, visualizing and comparing gene expression, and linking genes and other data across genomes (Objective 2). Additional curation efforts have targeted datasets to support mutant maize populations, improved gene structures, and other datasets related to functionally important regions in the genome (Objective 3). MaizeGDB continues to be the community hub for maize research, coordinating activities, planning the annual conference, and providing support to the maize research community, including support for new initiatives to create a more equitable scientific community by pursuing initiatives that foster diversity and inclusion (Objective 4), and building synergy with other crop and plant research communities (Objective 5). Work carried out by the MaizeGDB team has resulted in improved communication among maize researchers worldwide, increased ability to document the results of experiments, and increased availability of information relative to high-impact research.

Accomplishments
1. Development of a pan-genomic approach to explore the diversity of maize. The development of high-yielding resilient germplasm continues to be of paramount importance to reduce global food insecurity, improve access to proper nutrition, and curtail economic impacts from destructive diseases, pests, and environmental extremes. Breeders and farmers have taken advantage of the rich diversity of the maize genome to develop better varieties to meet these demands. ARS researchers at Ames, Iowa, developed an approach to represent the genetic and genomic relationships (pan-genome) of multiple diverse maize genomes (published in BMC Plant Biology) to facilitate the exploration of the maize genome and allow maize breeders and researchers to connect traits to genes. Resources are available for over 40 genomes representing distinct maize cultivars. An example tool is qTeller, a platform to compare how, when, where, and under what conditions genes are expressed across different cultivars. The pan-genomic resource allows researchers to understand basic plant biology, accelerate the pace of genetic enhancement and breeding, and translate those findings into products that increase crop quality and production.

Review Publications
Woodhouse, M.H., Cannon, E.K., Portwood II, J.L., Harper, E.C., Gardiner, J.M., Schaeffer, M.L., Andorf, C.M. 2021. A pan-genomic approach to genome databases using maize as a model system. Biomed Central (BMC) Plant Biology. 21. Article 385. https://doi.org/10.1186/s12870-021-03173-5.
Hufford, M.B., Seetharam, A.S., Woodhouse, M.H., Chougle, K.M., Ou, S., Liu, J., Ricci, W.A., Guo, T., Olson, A., Qiu, Y., Portwood II, J.L., Cannon, E.K., Andorf, C.M., Ware, D., Dawe, K.R. et al. 2021. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science. 373(6555):655-662. https://doi.org/10.1126/science.abg5289.
Woodhouse, M.H., Sen, S., Schott, D., Portwood II, J.L., Freeling, M., Walley, J.W., Andorf, C.M., Schnable, J.C. 2021. qTeller: A tool for comparative multi-genomic gene expression analysis. Bioinformatics. 38(1): 236-242. https://doi.org/10.1093/bioinformatics/btab604.

U.S. DEPARTMENT OF AGRICULTURE

Corn Insects and Crop Genetics Research: Ames, IA