Location: Soybean and Nitrogen Fixation Research
Title: Optimal brain dissection in dense autoencoders: Towards determining feature importance in -omics dataAuthor
AMIN, SHAFIN - North Carolina State University | |
VAN DEN BROECK, LISA - North Carolina State University | |
DE SMET, IVE - Ghent University | |
Locke, Anna | |
SOZZANI, ROSANGELA - North Carolina State University |
Submitted to: Workshop Proceedings
Publication Type: Proceedings Publication Acceptance Date: 10/23/2023 Publication Date: 11/30/2023 Citation: Amin, S., Van Den Broeck, L., De Smet, I., Locke, A.M., Sozzani, R. 2023. Optimal brain dissection in dense autoencoders: Towards determining feature importance in -omics data. Workshop Proceedings. https://doi.org/10.1109/BIP60195.2023.10379275. DOI: https://doi.org/10.1109/BIP60195.2023.10379275 Interpretive Summary: A new neural network algorithm called PF-NET was developed to classify proteins, and this new algorithm requires less prior information than other commonly used algorithms. PF-NET identified phosphatase proteins in a model plant species that have been experimentally validated but were not correctly identified by the older, more commonly used protein classification algorithm. PF-NET classified the soybean kinase and phosphatase protein families, which are important for stress signaling. These protein classifications were then used to help determine the protein signaling network that regulates cold stress responses in soybean seedlings. We identified important protein regulators of soybean temperature responses, which are important targets for future experiments and could be candidates for improving soybean temperature stress tolerance. Technical Abstract: Molecular biology aims to understand the molecular basis of cellular responses, unravel dynamic regulatory networks, and model complex biological systems. However, these studies remain challenging in non-model species as a result of poor functional annotation of regulatory proteins, like kinases or phosphatases. To overcome this limitation, we developed a multi-layer neural network that annotates proteins by determining functionality directly from the protein sequence. We annotated the kinases and phosphatases in the non-model species, Glycine max (soybean), achieving a prediction sensitivity of up to 97%. To demonstrate the applicability, we used our functional annotations in combination with Bayesian network principles to predict signaling cascades using time series phosphoproteomics. We shed light on phosphorylation cascades in soybean seedlings upon cold treatment and identified Glyma.10G173000 (TOI5) and Glyma.19G007300 (TOT3) as predicted key temperature response regulators in soybean. Importantly, the network inference does not rely upon known upstream kinases, kinase motifs, or protein interaction data, enabling de novo identification of kinase-substrate interactions. In addition to high accuracy and strong generalization, we showed that our functional prediction neural network is scalable to other model and non-model species, including Oryza sativa (rice), Zea mays (maize), Sorghum bicolor (sorghum), and Triticum aestivum (wheat). Taking together, we demonstrated a data-driven systems biology approach for non-model species leveraging our predicted upstream kinases and phosphatases. |