Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #414835

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Unlocking Genomic Diversity: Harnessing the Power of FAIR Principles in SorghumBase

Author
item TELLO-RUIZ, MARCELA - Cold Spring Harbor Laboratory
item KUMAR, VIVEK - Cold Spring Harbor Laboratory
item OLSON, ANDREW - Cold Spring Harbor Laboratory
item WEI, SHARON - North Dakota Department Of Agriculture
item Gladman, Nicholas
item Ware, Doreen

Submitted to: Acta Microbiologica Sinica
Publication Type: Abstract Only
Publication Acceptance Date: 4/2/2024
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: The SorghumBase knowledgebase (https://www.sorghumbase.org) aligns with the FAIR guiding principles for research data stewardship, ensuring findability, accessibility, interoperability, and reusability. Genetic variation, vital for biodiversity and species adaptation, plays a pivotal role in sustainable agriculture, food security, and climate change mitigation. SorghumBase is a pan-genome resource projected to include more than 50 sorghum genomes and over 100 million genetic variants by the end of 2024. Currently, we host approximately 70 million variants, 1.6 million of which could severely disrupt 27,201 annotated protein-coding genes (i.e., 77% of the sorghum proteome) by producing non-functional proteins. Variants include 57 million SNPs and small indels from natural populations mapped to the BTx623 reference genome with nearly 33 million SNPs and 5 million indels genotyped by sequencing 400 SAP lines, and almost 13 million SNPs determined in 499 accessions from the TERRA-MEPP, TERRA-REF population panels, and major cultivated sorghum races previously genotyped; and about 13 million EMS-induced mutations on a BTx623 background. All variants are filterable based on functional predictions and impact scores. Ongoing efforts involve coordination with GRIN Global and other stock centers for interoperability, expanding variant collections, incorporating additional EMS-induced and neutron mutagenized variants, and establishing GWAS phenotypic associations using controlled vocabularies (e.g., trait ontology). Notably, we adopted 41 million reference SNP cluster identifiers (rsIDs) from the European Variation Archive; thus offering labels for unequivocal SNP identification. rsIDs enhance data standards for linking sequence to function, which fosters interoperability across agricultural resources, akin to practices in human research. Supported by USDA ARS (8062-21000-041-00D), SorghumBase stands at the forefront of comprehensive and accessible sorghum genetic information.