Location: Crop Production and Pest Control Research
Title: Organ-delimited gene regulatory networks provide high accuracy in candidate transcription factor selection across diverse processesAuthor
RANJAN, RAJEEV - Purdue University | |
SRIJAN, SONALI - Purdue University | |
BALEKUTTIRA, SOMAIAH - Purdue University | |
AGARWAL, TINA - Purdue University | |
RAMEY, MELISSA - Purdue University | |
DOBBINS, MADISON - Purdue University | |
KUHN, RACHEL - Purdue University | |
WANG, XIAOJIN - Purdue University | |
Hudson, Karen | |
LI, YANG - Purdue University | |
VARALA, KRANTHI - Purdue University |
Submitted to: Proceedings of the National Academy of Sciences (PNAS)
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/14/2024 Publication Date: 4/23/2024 Citation: Ranjan, R., Srijan, S., Balekuttira, S., Agarwal, T., Ramey, M., Dobbins, M., Kuhn, R., Wang, X., Hudson, K.A., Li, Y., Varala, K. 2024. Organ-delimited gene regulatory networks provide high accuracy in candidate transcription factor selection across diverse processes. Proceedings of the National Academy of Sciences (PNAS). https://doi.org/10.1073/pnas.2322751121. DOI: https://doi.org/10.1073/pnas.2322751121 Interpretive Summary: This study examined a very large set of publicly available gene expression data and analyzed the networks of genes to identify genes that may regulate other genes. The accuracy of the predictions was tested by examination of the role of these genes in the seed lipid biosynthesis pathway. Many previously known transcription factors were included in our predictions, and we demonstrated a very high success rate for the novel transcription factors predicted by our approach. The approach described in this study is broadly applicable across any organism (plant or animal) that has a large body of public gene expression data. Technical Abstract: Construction of organ specific gene expression datasets that include hundreds to thousands of experiments would greatly aid reconstruction of gene regulatory networks at an organ level. However, creating precise organ sets is hampered by the requirement of a great degree of expert manual curation. Here we trained a supervised classification model that can accurately predict the plant organ from which a transcriptome sample was derived. This KNN-based multiclass classifier was used to generate organ specific gene expression datasets for the leaf, root, shoot, flower, seed, seedling, silique and stem organs in the model plant Arabidopsis thaliana. In each organ, a gene regulatory network (GRN) inference approach was used to determine: i. influential transcription factors (TFs) in that organ and, ii. the most influential TFs for specific processes in the organ. These organ-delimited GRNs, with no prior knowledge, rediscovered many known regulators of organ development and specific processes in those organs. In addition, many unknown TF regulators were ranked high in all these networks. We focus on experimentally validating the predicted TF regulators of lipid biosynthesis in seeds as a proof-of-concept. Of the top twenty candidate TFs, eight (e.g., WRI1, LEC1, FUS3 etc.) are known regulators of seed oil content and we validate seven more candidate TFs yielding a net accuracy rate of >75% for the de novo TF predictions. The general approach developed here is extensible to any species with sufficiently large gene expression datasets. |