Location: Plant, Soil and Nutrition Research
Project Number: 8062-21000-051-000-D
Project Type: In-House Appropriated
Start Date: Apr 4, 2023
End Date: Apr 3, 2028
Objective:
Objective 1: Improve characterization of pan-gene space through stewardship of genomic data for sorghum, maize, rice and grapevine crop research communities.
Objective 2: Identify and curate key expression and genetic variation datasets that will serve to enhance functional genome annotation with an emphasis on traits related to development and abiotic and biotic stress with focus on Sorghum.
Sub-objective 2a: Integration of gene expression information in collaboration with EBI Atlas.
2b: Integration of gene expression information in collaboration with the Bio-Analytic Resource for Plant Biology.
Sub-objective 2c: Integration of genetic variation information.
Objective 3: Maintain and extend cyberinfrastructure to integrate, add value, and visualize multi-omics data sets in a pan-genome context and facilitate genome to phenome knowledge discovery.
Sub-objective 3a: Develop data standards/resources, tools and platforms for germplasm and phenotypic data management to support data-driven breeding strategies and population genomics research for crop improvement.
Sub-objective 3b: Extend existing CLIMtools infrastructure to rice and sorghum.
Sub-objectives 3c: To maintain and extend our integrated data models to provide interactive visualizations of multi-omics data in a pan-genome context and support machine learning.
Objective 4: Provide community support services, build strategic partnerships, and provide database training and outreach activities with user communities and stakeholders.
Objective 5: Identify gene networks that govern plant development and response to the environment, to support the dissection of complex traits, using genomic, genetic, systems, and computational approaches in sorghum.
Sub-objective 5a: Inflorescence traits.
Sub-objective 5b: Nutrient use efficiency.
Approach:
The future of sustainable agriculture and crop breeding increasingly depends on integrating genetic resources with genomics, trait mapping, high-throughput phenotyping, and genome editing. However, challenges remain in converting large data into practical biological models and establishing scalable information systems for stakeholders to leverage. As such, this project will initiate strategic initiatives and collaborations, resulting in cyberinfrastructure development, new genomic resources, and hypothesis-based research.
The four objectives encompass managing plant data with periodic updates to SorghumBase and Gramene's pan-genome knowledge bases for maize, rice, and grapevine. Objective 1 increases genomes to 40 per site, create pan-gene sets, and use them for annotating new genomes. Objective 2 expands public gene expression studies, focusing on 38 sorghum RNAseq and scRNAseq studies for curation and integration into the EBI Gene Atlas (2a) and the Sorghum EFP browser (2b). Additionally, Objective 2c will improve on genetic variation data by updating and adding population studies, metadata, assigning Reference SNP IDs (RSIDs) for plants, and connecting functional data to variants. RSID adoption as a standard for agricultural species ensures permanent identifiers for genetic variants and supports linking of functional data.
Objective 3 expands functionality, including support for new data types, workflows, and visualizations. Objective 3a will establish standards, resources, tools, and platforms for germplasm and phenotypic data management; collecting data related to the Sorghum Association Panel and breeding lines to facilitate population genomics and data-driven breeding. Objective 3b will generate genotypes, geospatial information, and phenotypes for ~500 accessions in the USDA sorghum collection. This data will be used to establish associations between genetic variation, geospatial descriptors, and seed phenotypes while making it accessible via SorghumBase. Germplasm data from Objectives 3a/3b supports interoperability with the USDA Germplasm Resource Information Network. Objective 3c expands integrated data models and offers new interactive visualizations of multi-omics data collected in Objectives 1, 2, and 3 within a pan-genome context and facilitates machine learning.
Objective 4 establishes strategic partnerships and promotes broader outreach and training for stakeholders. These partnerships aid in developing and adopting data management standards and coordinating project resources aligned with ongoing community initiatives. Outreach and training encompasses representation at conferences, workshops, coordination of working groups (e.g., genomes, variation, and phenotypes), and virtual and in-person training.
Objective 5 focuses on genomic and molecular analyses of sorghum lines derived from mutagenesis or diversity populations exhibiting physical traits related to inflorescence architecture, root system architecture, nutrient use efficiency, and contributors to yield components. These studies will contribute to the growing publicly available data generated by the research community and will be hosted SorghumBase.