Location: Plant, Soil and Nutrition Research
Title: Ranked choice voting for representative transcripts with TRaCEAuthor
OLSON, ANDREW - Cold Spring Harbor Laboratory | |
Ware, Doreen |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 7/5/2021 Publication Date: 7/23/2021 Citation: Olson, A., Ware, D. 2021. Ranked choice voting for representative transcripts with TRaCE. Bioinformatics. 2021, btab542. https://doi.org/10.1093/bioinformatics/btab542. DOI: https://doi.org/10.1093/bioinformatics/btab542 Interpretive Summary: Genome annotation involves determining the locations of genes and modeling the structure of their transcripts. Comparative analysis often requires that a single canonical transcript be selected for each gene model. We developed TRaCE (Transcript Ranking and Canonical Election) to solve this problem using a method inspired by ranked choice voting. A population of "voters" rank transcripts by similarity, and an election selects the transcript that is most frequently the first choice. Ties are broken by tallying second choice transcript votes. This method works as well as more complex methods based on data that are typically on hand in a genome annotation project. Technical Abstract: Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an ‘election’ in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts. |