Location: Plant, Soil and Nutrition Research
Title: Unlocking Genomic Diversity: Harnessing the Power of FAIR Principles in SorghumBaseAuthor
TELLO-RUIZ, MARCELA - Cold Spring Harbor Laboratory | |
KUMAR, VIVEK - Cold Spring Harbor Laboratory | |
OLSON, ANDREW - Cold Spring Harbor Laboratory | |
WEI, SHARON - North Dakota Department Of Agriculture | |
Gladman, Nicholas | |
Ware, Doreen |
Submitted to: Acta Microbiologica Sinica
Publication Type: Abstract Only Publication Acceptance Date: 4/2/2024 Publication Date: N/A Citation: N/A Interpretive Summary: Technical Abstract: The SorghumBase knowledgebase (https://www.sorghumbase.org) aligns with the FAIR guiding principles for research data stewardship, ensuring findability, accessibility, interoperability, and reusability. Genetic variation, vital for biodiversity and species adaptation, plays a pivotal role in sustainable agriculture, food security, and climate change mitigation. SorghumBase is a pan-genome resource projected to include more than 50 sorghum genomes and over 100 million genetic variants by the end of 2024. Currently, we host approximately 70 million variants, 1.6 million of which could severely disrupt 27,201 annotated protein-coding genes (i.e., 77% of the sorghum proteome) by producing non-functional proteins. Variants include 57 million SNPs and small indels from natural populations mapped to the BTx623 reference genome with nearly 33 million SNPs and 5 million indels genotyped by sequencing 400 SAP lines, and almost 13 million SNPs determined in 499 accessions from the TERRA-MEPP, TERRA-REF population panels, and major cultivated sorghum races previously genotyped; and about 13 million EMS-induced mutations on a BTx623 background. All variants are filterable based on functional predictions and impact scores. Ongoing efforts involve coordination with GRIN Global and other stock centers for interoperability, expanding variant collections, incorporating additional EMS-induced and neutron mutagenized variants, and establishing GWAS phenotypic associations using controlled vocabularies (e.g., trait ontology). Notably, we adopted 41 million reference SNP cluster identifiers (rsIDs) from the European Variation Archive; thus offering labels for unequivocal SNP identification. rsIDs enhance data standards for linking sequence to function, which fosters interoperability across agricultural resources, akin to practices in human research. Supported by USDA ARS (8062-21000-041-00D), SorghumBase stands at the forefront of comprehensive and accessible sorghum genetic information. |