Skip to main content
ARS Home » Southeast Area » Raleigh, North Carolina » Soybean and Nitrogen Fixation Research » Research » Publications at this Location » Publication #408515

Research Project: Exploiting Genetic Diversity through Genomics, Plant Physiology, and Plant Breeding to Increase Competitiveness of U.S. Soybeans in Global Markets

Location: Soybean and Nitrogen Fixation Research

Title: Optimal brain dissection in dense autoencoders: Towards determining feature importance in -omics data

Author
item AMIN, SHAFIN - North Carolina State University
item VAN DEN BROECK, LISA - North Carolina State University
item DE SMET, IVE - Ghent University
item Locke, Anna
item SOZZANI, ROSANGELA - North Carolina State University

Submitted to: Workshop Proceedings
Publication Type: Proceedings
Publication Acceptance Date: 10/23/2023
Publication Date: 11/30/2023
Citation: Amin, S., Van Den Broeck, L., De Smet, I., Locke, A.M., Sozzani, R. 2023. Optimal brain dissection in dense autoencoders: Towards determining feature importance in -omics data. Workshop Proceedings. https://doi.org/10.1109/BIP60195.2023.10379275.
DOI: https://doi.org/10.1109/BIP60195.2023.10379275

Interpretive Summary: A new neural network algorithm called PF-NET was developed to classify proteins, and this new algorithm requires less prior information than other commonly used algorithms. PF-NET identified phosphatase proteins in a model plant species that have been experimentally validated but were not correctly identified by the older, more commonly used protein classification algorithm. PF-NET classified the soybean kinase and phosphatase protein families, which are important for stress signaling. These protein classifications were then used to help determine the protein signaling network that regulates cold stress responses in soybean seedlings. We identified important protein regulators of soybean temperature responses, which are important targets for future experiments and could be candidates for improving soybean temperature stress tolerance.

Technical Abstract: Molecular biology aims to understand the molecular basis of cellular responses, unravel dynamic regulatory networks, and model complex biological systems. However, these studies remain challenging in non-model species as a result of poor functional annotation of regulatory proteins, like kinases or phosphatases. To overcome this limitation, we developed a multi-layer neural network that annotates proteins by determining functionality directly from the protein sequence. We annotated the kinases and phosphatases in the non-model species, Glycine max (soybean), achieving a prediction sensitivity of up to 97%. To demonstrate the applicability, we used our functional annotations in combination with Bayesian network principles to predict signaling cascades using time series phosphoproteomics. We shed light on phosphorylation cascades in soybean seedlings upon cold treatment and identified Glyma.10G173000 (TOI5) and Glyma.19G007300 (TOT3) as predicted key temperature response regulators in soybean. Importantly, the network inference does not rely upon known upstream kinases, kinase motifs, or protein interaction data, enabling de novo identification of kinase-substrate interactions. In addition to high accuracy and strong generalization, we showed that our functional prediction neural network is scalable to other model and non-model species, including Oryza sativa (rice), Zea mays (maize), Sorghum bicolor (sorghum), and Triticum aestivum (wheat). Taking together, we demonstrated a data-driven systems biology approach for non-model species leveraging our predicted upstream kinases and phosphatases.