Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #399297

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

Title: Creating a FAIR data ecosystem for incorporating single cell genomics data into agricultural G2P research

Author
item KAPOOR, MUSKAN - Iowa State University
item SOKOLOV, ALEXEY - Embl-Ebi
item VENTURA, ENRIQUE SEPENA - Embl-Ebi
item YORDANOVA, GALABINA - Embl-Ebi
item PROVART, NICHOLAS - University Of Toronto
item PAPATHEODOROU, IRENE - Embl-Ebi
item GEORGE, NANCY - Embl-Ebi
item Ware, Doreen
item KUMARI, SUNITA - Cold Spring Harbor Laboratory
item TICKLE, TIMOTHY - Massachusetts Institute Of Technology
item COLE, BENJAMIN - Lawrence Berkeley National Laboratory
item BURDETT, TONY - Embl-Ebi
item HARRISON, PETER - Embl-Ebi
item TUGGLE, CHRISTOPHER - Iowa State University

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 10/14/2022
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: The agriculture genomics community has numerous data submission standards available, but little experience in describing and storing single cell (e.g. scRNAseq) data. Other single cell genomics infrastructure efforts, such as the Human Cell Atlas Data Coordination Platform (HCA DCP), have resources that could benefit our community. For example, the HCA DCP is integrated with Terra, a cloud-native workbench for computational biology developed by Broad, Verily and Microsoft that houses tools for scGenomics analysis at scale. We will describe a pilot-scale project to determine if our current metadata standards for livestock and crops can be used to ingest scRNAseq datasets in a manner consistent with HCA DCP standards and if established resources (e.g. Terra) can be used to analyze the ingested data. Currently, the most comprehensive data ingestion portal for high throughput sequencing datasets from plants, fungi, protists and animals (including human) at the European Bioinformatics Institute, Annotare, ensures that sufficient metadata are collected to enable re-analysis and dissemination via the Single Cell Expression Atlas knowledgebase (SCEA). To support use of controlled vocabularies, Annotare supports an ontology auto-complete function that allows the users to search for and use the appropriate terms from many ontologies and can readily be used to process and search single cell data via the SCEA and transferred to the Galaxy analysis space for further analysis. All experiments submitted to ArrayExpress via annotare are manually curated by bioinformaticians. There is another portal that is limited to animal single cell datasets, the FAANG portal, provides access to bulk and scRNAseq data. scRNAseq data/metadata can be submitted to the FAANG using a semi-automated process where files can be validated using the HCA DCP metadata and data validation service. Once incorporated, datasets are used to augment this resource for use by the scientific community. These files are also incorporated using EMBL-EBI’s HCA DCP ingestion service, and then transferred to Terra for further analysis. We intend to build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem to facilitate single cell-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.