Project : USDA ARS

ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #434521

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

2021 Annual Report

Objectives
Objective 1: Accelerate maize trait analysis, germplasm analysis, genetic studies, and breeding through stewardship of maize genomes, genetic data, genotype data, and phenotype data. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in maize germplasm. Objective 3: Identify and curate key datasets for benchmarking genomic discovery tools for the functional annotation of maize genomes, for agronomic trait analyses, for breeding (including genome editing), and for improving database interoperability. Objective 4: Provide community support services, training and documentation, meeting coordination, support for community elections and surveys, and support for the crop genome database community. Objective 5: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability.

Approach
The Maize Genetics and Genomics Database (MaizeGDB – http://www.maizegdb.org) is the model organism database for maize. MaizeGDB’s overall aim is to provide long-term storage, support, and stability to the maize research community’s data and to provide informatics services for access, integration, visualization, and knowledge discovery. The MaizeGDB website, database, and underlying resources allow plant researchers to understand basic plant biology, make genetic enhancement, facilitate breeding efforts, and translate those findings into products that increase crop quality and production. To accelerate research and breeding progress, generated data must be made freely and easily accessible. Curation of high-quality and high-impact datasets has been the foundation of the MaizeGDB project since its inception over 25 years ago. MaizeGDB serves as a two-way conduit for getting maize research data to and from our stakeholders. The maize research community uses data at MaizeGDB to facilitate their research, and in return, their published data gets curated at MaizeGDB. The information and data provided at MaizeGDB and facilitated through outreach has directly been used in research that has had broad commercial, social, and academic impacts. The MaizeGDB team will make accessible high-quality, actively curated and reliable genetic, genomic, and phenotypic description datasets. At the root of high-quality genome annotation lies well-supported assemblies and annotations. For this reason, we focus our efforts on benefitting researchers by developing a system to ensure long-term stewardship of both a representative reference genome sequence assembly with associated structural and functional annotations as well as additional reference-quality genomes that help represent the diversity of maize. In addition, we will enable researchers to access data in a customized and flexible manner by deploying tools that enable direct interaction with the MaizeGDB database. Continued efforts to engage in education, outreach, and organizational needs of the maize research community will involve the creation and deployment of video and one-on-one tutorials, updating maize Cooperators on developments of interest to the community, and supporting the information technology needs of the Maize Genetics Executive Committee and Annual Maize Genetics Conference Steering Committee.

Progress Report
Working on the Maize Genetics and Genomics Database (MaizeGDB) we provide tools and resources that make the maize genome sequence useful for investigative research and crop improvement. The objectives of the MaizeGDB team are to provide stewardship to key datasets related to maize genetics, genomics, and breeding, develop robust infrastructure to store, query, integrate, and visualize data, curate high-quality, high-impact datasets, interact with the maize research community to identify needs and priorities, and to work with other outside communities and databases to coordinate on data standards and interoperability. MaizeGDB’s stewardship efforts have focused on high-quality genome sequences. MaizeGDB now hosts 44 reference-quality maize genome assemblies and data for thousands of other public maize lines, representing the broad diversity of maize genomic and genetic data (Objective 1) . MaizeGDB has ongoing efforts to curate datasets (more than 1,000 datasets to date) that have been associated with functional regions in the genome, which can be visually explored through a genome browser tool (Objectives 1-3). MaizeGDB now supports over 40 genome browsers for recently released maize genomes and over one hundred datasets that can be used as targets for sequence similarity searches (Objective 2). Researchers can use these data to compare genomes with different phenotypes for a particular trait of interest and identify genome changes associated with phenotypic differences (Objectives 2 & 3). Data is being generated and collected to represent how genes and proteins are expressed in different conditions and across multiple genome assemblies. This allows researchers to leverage research outcomes from many different sources and data types, all within the context of the maize reference assemblies. A gene expression viewer was updated with collaboration with Iowa State University to support 26 maize genomes with over 750 gene expression experiments (Objective 2). This tool helps researchers determine how genes are regulated and which genes to select for targeted crop improvement. MaizeGDB continues to identify key datasets to include in a tool co-developed with the University of Missouri which enables researchers to integrate their data with publicly available data (Objectives 2 & 3). This tool makes it easy for researchers to query and analyze data from many different studies. MaizeGDB has improved its methodology for curating, hosting, and integrating large sets of genome assemblies (Objective 1). MaizeGDB has enhanced tools for annotating gene structures, visualizing structural variation, visualizing and comparing gene expression, and linking genes and other data across genomes (Objective 2). Additional curation efforts have targeted datasets to support mutant maize populations, improved gene structures, and other datasets related to functionally important regions in the genome (Objective 3). MaizeGDB continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community, including support for new initiatives to foster a diverse and inclusive community of maize researchers (Objective 4), and building synergy with other crop and plant research communities (Objective 5). Work carried out by the MaizeGDB team has resulted in improved communication among maize researchers worldwide, increased ability to document the results of experiments, and increased availability of information relative to high-impact research.

Accomplishments
1. The Maize Genetics and Genomics Database (MaizeGDB) provided pan-genomic resources for a set of 44 maize genome assemblies. Research on crops is increasingly relying on a complete set of genes across a species (i.e. pan-genome). Pan-genomes are especially valuable in plants with high genomic diversity (e.g. maize) that can be exploited for crop improvement. ARS researchers at Ames, Iowa, have introduced a pan-genomic approach to hosting a genomic database, leveraging the large number of diverse maize genomes available and their associated datasets to efficiently connect plant genomes with traitsw of interest and their inheritance and control. Over the past several years, MaizeGDB has transitioned from a single reference genome to a multi-genomics resource. MaizeGDB now hosts data and tools integrated across 44 diverse maize genome assemblies. Recent improvements at MaizeGDB include an improved method for gene annotation (published in BMC Bioinformatics), a tool to visualize structural variation across genomes, methods to connect functional annotations across genomes, and an updated resource to compare and visualize gene atlases. This new pan-genomic approach is a potential framework for other crop databases and a resource to facilitate improved crop performance by helping researchers understand the relationship between the genes in a plant and the traits observed in farmers’ fields.

Review Publications
Shamimuzzaman, M., Gardiner, J.M., Walsh, A.T., Triant, D.A., Le Tourneau, J.J., Tayal, A., Unni, D.R., Nguyen, H.H., Portwood Ii, J.L., Cannon, E.K., Andorf, C.M., Elsik, C.G. 2020. MaizeMine: A data mining warehouse for the maize genetics and genomics database (MaizeGDB). Frontiers in Plant Science. 11. Article 592730. https://doi.org/10.3389/fpls.2020.592730.
Banerjee, S., Bhandary, P., Woodhouse, M.H., Sen, T.Z., Wise, R.P., Andorf, C.M. 2021. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics. 22. Article 205. https://doi.org/10.1186/s12859-021-04120-9.
Marcon, C.; Altrogge, L.; Win Y.N.; Stöcker, T.; Gardiner, J.M.; Portwood, J.L. 2nd; Opitz, N.; Kortz, A.; Baldauf, J.A.; Hunter, C.T.; McCarty, D.R.; Koch, K.E.*; Schoof, H.; Hochholdinger, F. 2020 BonnMu: A sequence-indexed resource of transposon-Induced maize mutations for functional genomics studies. Plant Physiology 184(2):620-631. https://doi.org/10.1104/pp.20.00478
Wilkey, A., Brown, A.V., Cannon, S.B., Cannon, E.K. 2020. GCViT: a method for interactive, genome-wide visualization of resequencing and SNP array data. Biomed Central (BMC) Genomics. 21. Article 822. https://doi.org/10.1186/s12864-020-07217-2.

U.S. DEPARTMENT OF AGRICULTURE

Corn Insects and Crop Genetics Research: Ames, IA