Location: Plant, Soil and Nutrition Research
Title: Standardizing biocuration of genetic variation data to promote FAIRificationAuthor
TELLO-RUIZ, MARCELA - Cold Spring Harbor Laboratory | |
ALI, KAZIM - University Of Karachi | |
Ali, Gul - Shad | |
Bassil, Nahla | |
BEIER, SEBASTIAN - Ibg-4 Bioinformatics | |
Bushakra, Jill | |
COBO-SIMON, IRENE - Instituto Nacional De Investigacion Y Technologia Agraria Y Alimentaria | |
Ware, Doreen | |
WEI, SHARON - Cold Spring Harbor Laboratory | |
CEZARD, TIMOTHEE - Embl-Ebi | |
DYER, SARAH - Embl-Ebi | |
Gutierrez, Osman | |
Harrison, Melanie | |
HUMANN, JODI - Washington State University | |
KUMAR, VIVEK - Cold Spring Harbor Laboratory | |
Nelson, Rex | |
SALAVATI, MAZDAK - Roslin Institute | |
SHEEHAN, MOIRA - Cornell University |
Submitted to: Meeting Abstract
Publication Type: Abstract Only Publication Acceptance Date: 1/12/2024 Publication Date: N/A Citation: N/A Interpretive Summary: Technical Abstract: The Standards for Genetic Variation Data Working Group of the AgBioData Consortium brings together a community of biocurators, data providers, bioinformaticians, and computer scientists engaged in agricultural research. Late this year, the Public Genetic Resources Working Group merged with our group. Our working group’s primary tasks have evolved into the harmonization and adoption of standards for genotypic and phenotypic variation data across diverse platforms in the plant and animal kingdoms. Additionally, the group aims to promote interoperability and facilitate access to these datasets for researchers and breeders. Thanks to the FAANG (Functional Annotation of ANimal Genomes) project, there has been considerable progress in the adoption and dissemination of metadata standards for animal genetic variants. In plants, the first guidelines for findable, accessible, interoperable, and reusable (FAIR) handling of genetic variants were published in 2022. This involved direct collaboration with EMBL-EBI, one of the International Nucleotide Sequence Database Collaboration (INSDC) pillars, to support data submission to BioSamples and the European Variation Archive (EVA) global repository. A preliminary checklist was provided to classify and validate data and metadata, making significant progress in enhancing data availability. The Standards for Genetic Variation Working Group has broadened such guidelines with recommendations to crosslink sample identifiers with agricultural resources, specifically germplasm repositories like USDA-ARS GRIN (Germplasm Resources Information Network)-Global. The group also suggests including synonyms for common sample names, and include traceable population panel associations. We surveyed the AgBioData community, namely species-specific and clade-wide databases, germplasm repositories, as well as independent data producers. The goal was to gather information on existing and anticipated genetic variation data sets to facilitate adoption of standards, and promote interoperability between resources. In addition, we identified new challenges, such as the lack of reference genome assemblies in an INSDC repository or genetic variation not publicly available in standard form (e.g., VCF file), and discussed potential solutions and sustainability workflows. This includes adapting and further developing tools used to address similar problems encountered previously with human data. We will showcase how such challenges are being addressed. Progress towards the above objectives, along with the crucial need for training data generators submitting data to public repositories, is critical to make genetic variation data more FAIR for agroscience. Future plants look to link variation data sets with phenotypic data to support association studies and advancing breeding approaches. |