Location: Plant, Soil and Nutrition Research
Title: Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciencesAuthor
DENG, CECILIA - New Zealand Institute Of Plant & Food Research | |
NAITHANI, SUSHMA - Oregon State University | |
KUMARI, SUNITA - Cold Spring Harbor Laboratory | |
COBO-SIMON, IRENE - University Of Connecticut | |
QUEZADA-RODRIQUEZ, ELSA - Universidad Nacional Autonoma De Mexico | |
SKRABISOVA, MARIA - Palacky University | |
Gladman, Nicholas | |
CORREL, MELANIE - University Of Florida | |
SIKIRU, AKEEM BABATUNDE - Federal University Of Agriculture, Abeokuta | |
JUNG, SOOK - Washington State University | |
AFUWAPE, OLUSOLA - University Of Lagos | |
MARRANO, ANNARITA - Phoenix Bioinformatics | |
REBOLLO, INES - Universidad De La República | |
ZHANG, WENTAO - National Research Council - Canada |
Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/12/2024 Publication Date: 12/11/2023 Citation: Deng, C.H., Naithani, S., Kumari, S., Cobo-Simon, I., Quezada-Rodriquez, E.H., Skrabisova, M., Gladman, N.P., Correl, M.J., Sikiru, A., Jung, S., Afuwape, O., Marrano, A., Rebollo, I., Zhang, W. 2023. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database: The Journal of Biological Databases and Curation. https://doi.org/10.1093/database/baad088. DOI: https://doi.org/10.1093/database/baad088 Interpretive Summary: As part of our groups effort to participate and drive data standards and implementation to align with Findable, Accessible, Interoperable, and Reusable (FAIR) principles, this work was published from the AgBio Data Consortium Genotype-Phenotype Working Group. This work and ultimate publication describes the current best practices and considerations for a variety of genomics data that is implementable across agricultural systems. Future considerations include expansion of infrastructure support, constant development of community standards, resources for biocuration, and other recommendations Technical Abstract: Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021–22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org. |