Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #358295

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: Predicting gene structure changes resulting from genetic variants via exon definition features

Author
item MAJOROS, WILLIAM - Duke University
item HOLT, CARSON - University Of Utah
item CAMPBELL, MICHAEL - Cold Spring Harbor Laboratory
item Ware, Doreen
item YANDELL, MARK - University Of Utah
item REDDY, TIMOTHY - Duke University

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/24/2018
Publication Date: 4/25/2018
Citation: Majoros, W., Holt, C., Campbell, M., Ware, D., Yandell, M., Reddy, T. 2018. Predicting gene structure changes resulting from genetic variants via exon definition features. Bioinformatics. 34(21):3616-3623.

Interpretive Summary: Within a species, there are differences in the sequences of genomes. This “genetic variation” when it is in a gene region has the potential to impact a gene product, and potentially the function of the gene. In this paper, we describe a method by which we can predict if the gene product is likely to change, and if that change may have an impact on gene function. We demonstrated the method using both real and simulated data, and found that our methods identifie changes in gene function, that can have large effects on the structure of the genes transcript, thus, impacting the resulting protein. These changes in some case could help understand the underlying cause in human disease or plant performance.

Technical Abstract: MOTIVATION: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. RESULTS: We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease.