Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #382631

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

Title: Ranked choice voting for representative transcripts with TRaCE

Author
item OLSON, ANDREW - Cold Spring Harbor Laboratory
item Ware, Doreen

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/5/2021
Publication Date: 7/23/2021
Citation: Olson, A., Ware, D. 2021. Ranked choice voting for representative transcripts with TRaCE. Bioinformatics. 2021, btab542. https://doi.org/10.1093/bioinformatics/btab542.
DOI: https://doi.org/10.1093/bioinformatics/btab542

Interpretive Summary: Genome annotation involves determining the locations of genes and modeling the structure of their transcripts. Comparative analysis often requires that a single canonical transcript be selected for each gene model. We developed TRaCE (Transcript Ranking and Canonical Election) to solve this problem using a method inspired by ranked choice voting. A population of "voters" rank transcripts by similarity, and an election selects the transcript that is most frequently the first choice. Ties are broken by tallying second choice transcript votes. This method works as well as more complex methods based on data that are typically on hand in a genome annotation project.

Technical Abstract: Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an ‘election’ in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts.