Skip to main content
ARS Home » Midwest Area » West Lafayette, Indiana » Crop Production and Pest Control Research » Research » Publications at this Location » Publication #405134

Research Project: Genetic Enhancement of Seed Quality and Plant Health Traits, and Designing Soybeans with Improved Functionality

Location: Crop Production and Pest Control Research

Title: Organ-delimited gene regulatory networks provide high accuracy in candidate TF selection across diverse processes

Author
item RANCAN, RAJEEV - Purdue University
item SRIJAN, SONALI - Purdue University
item BALEKUTTIRA, SOMAIAH - Purdue University
item AGARWAL, TINA - Purdue University
item RAMEY, MELISSA - Purdue University
item DOBBINS, MADISON - Purdue University
item WANG, XIAOJIN - Purdue University
item Hudson, Karen
item LI, YANG - Purdue University
item VARALA, KRANTHI - Purdue University

Submitted to: Proceedings of the National Academy of Sciences (PNAS)
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/14/2024
Publication Date: 4/23/2024
Citation: Rancan, R., Srijan, S., Balekuttira, S., Agarwal, T., Ramey, M., Dobbins, M., Wang, X., Hudson, K.A., Li, Y., Varala, K. 2024. Organ-delimited gene regulatory networks provide high accuracy in candidate TF selection across diverse processes. Proceedings of the National Academy of Sciences (PNAS). https://doi.org/10.1073/pnas.2322751121.
DOI: https://doi.org/10.1073/pnas.2322751121

Interpretive Summary: This study examined a very large set of publicly available gene expression data and analyzed the networks of genes to identify genes that may regulate other genes. The accuracy of the predictions was tested by examination of the role of these genes in the seed lipid biosynthesis pathway. Many previously known transcription factors were included in our predictions, and we demonstrated a very high success rate for the novel transcription factors predicted by our approach. The approach described in this study is broadly applicable across any organism (plant or animal) that has a large body of public gene expression data.

Technical Abstract: Construction of organ specific gene expression datasets that include hundreds to thousands of experiments would greatly aid reconstruction of gene regulatory networks at an organ level. However, creating precise organ sets is hampered by the requirement of a great degree of expert manual curation. Here we trained a supervised classification model that can accurately predict the plant organ from which a transcriptome sample was derived. This KNN-based multiclass classifier was used to generate organ specific gene expression datasets for the leaf, root, shoot, flower, seed, seedling, silique and stem organs in the model plant Arabidopsis thaliana. In each organ, a gene regulatory network (GRN) inference approach was used to determine: i. influential transcription factors (TFs) in that organ and, ii. the most influential TFs for specific processes in the organ. These organ-delimited GRNs, with no prior knowledge, rediscovered many known regulators of organ development and specific processes in those organs. In addition, many unknown TF regulators were ranked high in all these networks. We focus on experimentally validating the predicted TF regulators of lipid biosynthesis in seeds as a proof-of-concept. Of the top twenty candidate TFs, eight (e.g., WRI1, LEC1, FUS3 etc.) are known regulators of seed oil content and we validate seven more candidate TFs yielding a net accuracy rate of >75% for the de novo TF predictions. The general approach developed here is extensible to any species with sufficiently large gene expression datasets.