Skip to main content
ARS Home » Plains Area » Houston, Texas » Children's Nutrition Research Center » Research » Publications at this Location » Publication #416589

Research Project: Metabolic and Epigenetic Regulation of Nutritional Metabolism

Location: Children's Nutrition Research Center

Title: PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data

Author
item JONNAKUTI, VENKATA - Baylor College Of Medicine
item WAGNER, ERIC - University Of Rochester
item MALETIC-SAVATIC, MIRJANA - Baylor College Of Medicine
item LIU, ZHANDONG - Baylor College Of Medicine
item YALAMANCHILI, HARI - Children'S Nutrition Research Center (CNRC)

Submitted to: Cell Reports Methods
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/11/2024
Publication Date: 2/26/2024
Citation: Jonnakuti, V.S., Wagner, E.J., Maletic-Savatic, M., Liu, Z., Yalamanchili, H.K. 2024. PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data. Cell Reports Methods. 4:Article 100707. https://doi.org/10.1016/j.crmeth.2024.100707.
DOI: https://doi.org/10.1016/j.crmeth.2024.100707

Interpretive Summary: PolyAMiner-Bulk is a novel deep learning-based algorithm developed to decode alternative polyadenylation (APA) dynamics from bulk RNA-seq data. APA is a critical post-transcriptional mechanism that generates multiple mRNA isoforms from a single gene, playing a vital role in regulating gene expression. Misregulation of APA is associated with numerous diseases, including neurodegenerative disorders, cancer, and metabolic conditions. However, current computational methods for detecting cleavage and polyadenylation sites (C/PASs) and analyzing 3' UTR length variations in bulk RNA-seq data face major hurdles, such as inadequate C/PAS annotations, challenges in disentangling overlapping C/PASs, and difficulties in pinpointing specific APA site changes. These challenges become more pronounced in large-scale cohort studies including nutrition studies. By utilizing an attention-based deep learning architecture, PolyAMiner-Bulk models RNA as a language, capturing complex dependencies within RNA sequences to accurately infer APA dynamics. This tool holds significant potential for pediatric nutrition and obesity research. Identifying diet induced APA changes and understanding how these APA changes affect gene expression in metabolic pathways could reveal new targets for dietary interventions for childhood obesity and diabetes. Additionally, it can help unravel APA-related mechanisms in metabolic diseases, providing insights that could lead to improved strategies for managing these conditions.

Technical Abstract: Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Utilizing an advanced deep learning model, C/PAS-BERT, PolyAMiner-Bulk aims for precise C/PAS identification and comprehensive APA analysis, bridging the gap in APA research using bulk RNA-seq data. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.