Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #394458

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

Title: Maize feature store: a centralized resource to manage and analyze curated maize multi-omics features for machine learning applications

Author
item SEN, SHATABDI - Iowa State University
item Woodhouse, Margaret
item Portwood, John
item Andorf, Carson

Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/9/2023
Publication Date: 11/6/2023
Citation: Sen, S., Woodhouse, M.H., Portwood Ii, J.L., Andorf, C.M. 2023. Maize feature store: a centralized resource to manage and analyze curated maize multi-omics features for machine learning applications. Database: The Journal of Biological Databases and Curation . 2023. Article baad078. https://doi.org/10.1093/database/baad078.
DOI: https://doi.org/10.1093/database/baad078

Interpretive Summary: The analysis of large-scale, complex data associated with maize genomes is increasingly used to promote genetic research and improve agronomic traits. As a result, efforts have increased to integrate these diverse datasets and extract meaning from their relationships. Currently, there are no tools to rapidly link and evaluate these data and their relationship to genes. In our work, we present the Maize Feature Store (MFS), a versatile application that combines features built on complex data to facilitate gene exploration, modeling, and analysis. In addition, the MFS integrates various machine-learning algorithms that can significantly simplify the prediction of complex genome annotations and functions, and is capable of creating an accurate classification model for predicting genes conserved across all varieties of maize.

Technical Abstract: The big-data analysis of multi-omics data associated with maize genomes is increasingly utilized to accelerate genetic research and improve agronomic traits. As a result, efforts have increased to integrate diverse datasets and extract meaning from these measurements. In the past, multiple multi-layer data structures have been proposed for the integration of multi-omics biological information associated with maize, but none have incorporated an unstructured omics data warehouse along with providing end-to-end solutions for evaluating and linking these data to target gene annotations. In our work, we present the Maize Feature Store (MFS), a versatile NoSQL-based Flask application that combines features built on multi-omics data to facilitate exploration and modeling of a broader spectrum of heterogeneous information via a variety of univariate, bivariate, and multivariate analysis modules. In addition, MFS integrates various machine-learning algorithms, both supervised and unsupervised that can significantly simplify the analysis and prediction of complex genome annotations. The MFS was capable of creating an accurate pan-genome classification model with an AUC-ROC score of 0.853. The functionality of MFS can be freely accessed via an online webserver (https://mfs.maizegdb.org/).