Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #414934

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences

Author
item DENG, CECILIA - New Zealand Institute Of Plant & Food Research
item NAITHANI, SUSHMA - Oregon State University
item KUAMRI, SUNITA - Cold Spring Harbor Laboratory
item COBO-SIMÓN, IRENE - University Of Connecticut
item QUEZADA-RODRÍGUEZ, ELSA - Universidad Nacianal Autonoma De Mexico
item SKRABISOVA, MARIA - Palacky University
item Gladman, Nicholas
item CORRELL, MELANIE - University Of Florida
item SIKIRU, AKEEM BABATUNDE - Howard Hughes Medical Institute
item JUNG, SOOK - Washington State University
item AFUWAPE, OLUSOLA - University Of Lagos
item MARRANO, ANNARITA - Phoenix Bioinformatics
item REBOLLO, INES - Universidad De La República
item ZHANG, WENATAO - National Research Council - Canada

Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/12/2024
Publication Date: 12/11/2023
Citation: Deng, C.H., Naithani, S., Kuamri, S., Cobo-Simón, I., Quezada-Rodríguez, E.H., Skrabisova, M., Gladman, N.P., Correll, M.J., Sikiru, A., Jung, S., Afuwape, O., Marrano, A., Rebollo, I., Zhang, W. 2023. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database: The Journal of Biological Databases and Curation . 2023. https://doi.org/10.1093/database/baad088.
DOI: https://doi.org/10.1093/database/baad088

Interpretive Summary:

Technical Abstract: Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.