Location: Corn Insects and Crop Genetics Research
2023 Annual Report
Objectives
Objective 1: Improve maize trait analysis (e.g., drought and cold tolerance, disease and pest resistance), germplasm development, genetic studies, and breeding through stewardship of maize genomes, pan genomes, genetic data, and phenotype data.
Goal 1.A: Bring in reference-quality genome assemblies of domesticated maize outgroups that include stress-resilient varieties and connect gene-model and genome-browser pan-gene relationships between these genomes and domesticated maize.
Goal 1.B: Represent and integrate maize diversity through hosting maize genomes, pan-genomes, graph information, and whole-genome sequencing data.
Objective 2: Identify and curate key datasets (e.g., 3-D protein structure, pangenome gene functions) that will serve to enhance maize functional genome annotation with an emphasis on the targeted curation of traits related to abiotic and biotic stress and climate change.
Goal 2.A: Integrate maize stress-response expression and trait data with MaizeGDB genomes and functional genome annotation tools.
Goal 2.B: Integrate 3-D gene model protein structures across maize genomes, compare them within a pan-gene framework, and create gene function predictions based on protein structure similarity.
Objective 3: Develop infrastructure to integrate, add value to, and visualize multi-omics data sets, enable comparative genomics, facilitate genome to phenome knowledge discovery, and provide analysis through artificial intelligence approaches and genomic discovery tools.
Goal 3.A: Provide comparative and pan-genome resources to understand diversity and organize genes and develop artificial intelligence approaches to facilitate exploration of the complex relationship between phenotype and genotype.
Objective 4: Provide community support services, build strategic partnerships, and provide database training and outreach activities for user communities and stakeholders.
Goal 4.A: Facilitate communication among maize researchers to support research community needs and create and leverage synergistic activities with other databases and plant research communities.
Approach
The Maize Genetics and Genomics Database (MaizeGDB – http://www.maizegdb.org) is the model organism database for maize. MaizeGDB’s overall aim is to provide long-term storage, support, and stability to the maize research community’s data and to provide informatics services for access, integration, visualization, and knowledge discovery. The MaizeGDB website, database, and underlying resources allow plant researchers to understand basic plant biology, make genetic enhancement, facilitate breeding efforts, and translate those findings into products that increase crop quality and production. To accelerate research and breeding progress, generated data must be made freely and easily accessible. Curation of high-quality and high-impact datasets has been the foundation of the MaizeGDB project since its inception over 25 years ago. MaizeGDB serves as a two-way conduit for getting maize research data to and from our stakeholders. The maize research community uses data at MaizeGDB to facilitate their research, and in return, their published data gets curated at MaizeGDB. The information and data provided at MaizeGDB and facilitated through outreach has directly been used in research that has had broad commercial, social, and academic impacts. The MaizeGDB team will make accessible high-quality, actively curated and reliable genetic, genomic, and phenotypic description datasets. At the root of high-quality genome annotation lies well-supported assemblies and annotations. For this reason, we focus our efforts on benefitting researchers by developing a system to ensure long-term stewardship of both a representative reference genome sequence assembly with associated structural and functional annotations as well as additional reference-quality genomes that help represent the diversity of maize. In addition, we will enable researchers to access data in a customized and flexible manner by deploying tools that enable direct interaction with the MaizeGDB database. Continued efforts to engage in education, outreach, and organizational needs of the maize research community will involve the creation and deployment of video and one-on-one tutorials, updating maize Cooperators on developments of interest to the community, and supporting the information technology needs of the Maize Genetics Executive Committee and Annual Maize Genetics Conference Steering Committee.
Progress Report
ARS scientists from the Maize Genetics and Genomics Database (MaizeGDB) team in Ames, Iowa, provide valuable tools and resources for investigative research and crop improvement by leveraging maize genetics, genomics, and breeding data. For Objective 1, MaizeGDB continues to expand its stewardship efforts to encompass a wider range of high-quality genome sequences, including genome assemblies and annotations from closely related species. These resources represent the rich diversity of maize genomic and genetic data available to researchers, comprising over 100 genomes and more than a thousand supporting datasets. The genomes and accompanying tools will enable researchers to explore functional regions in the genome through the genome browser tool. MaizeGDB supports over 50 genome browsers and provides access to over 130 datasets for sequence similarity searches. These efforts will significantly enhance researchers' ability to analyze and leverage data from different sources and data types, facilitating targeted crop improvement.
For Objective 2, MaizeGDB continues to identify, curate, and host key datasets that contribute to the functional genome annotation of maize. Notably, the focus is on datasets related to abiotic and biotic stress, as well as climate change. A gold-standard dataset comprising approximately 3,000 genes, derived from 25 published studies, is being generated by MaizeGDB. This demonstrates that differential gene expression under nine biotic and eleven abiotic conditions in maize. These curated datasets serve as valuable references for studying and understanding the genetic basis of stress tolerance and climate resilience in maize. Integrating these high-quality datasets, including 3-D protein structure and pan-genome gene functions, enriches the resources at MaizeGDB for researchers studying the impact of these factors on maize traits.
Regarding Objective 3, MaizeGDB continues to develop robust infrastructure to accommodate multi-omics datasets and facilitate knowledge discovery. The updated infrastructure will enable the integration and visualization of diverse omics data. One notable tool called Maize Feature Store streamlines the use of machine-learning features in biological research by providing easy access to thousands of commonly used features derived from various omic data sources. The improved infrastructure empowers researchers to perform comparative genomics and gain insights into the complex relationships between the genome and phenome. Additionally, artificial intelligence approaches and genomic discovery tools have been embraced, providing advanced analysis capabilities for researchers working with maize data. These methods can successfully predict and search 3-D protein structures, identify genes associated with stress response, and assign functional labels to genes. These techniques are especially crucial for identifying functional insights in domains with scarce or absent experimental data. Specifically, we have used machine learning to predict phosphorylation sites within protein sequences. Phosphorylation is an important post-translational modification that regulates a variety of essential biological processes but has limited experimental data for maize and other important crops. These enhancements promote efficient and comprehensive data analysis, accelerating scientific discoveries in maize genetics and genomics.
Objective 4 focuses on MaizeGDB's role as the central hub for maize research, facilitating communication and collaboration among researchers worldwide. The MaizeGDB team actively engages with the maize research community to identify their needs and priorities, enabling tailored resources and services to better serve their requirements. Additionally, strategic partnerships have been forged with dozens of agricultural biological databases, collaborating on data standards and interoperability. Training and outreach activities are provided to empower user communities and stakeholders, ensuring they can maximize the benefits of MaizeGDB. Through stewardship efforts, infrastructure development, curation of key datasets, and community support services, MaizeGDB continues to enhance the landscape of maize genetics, genomics, and breeding research.
Accomplishments