Location: Genetics and Animal Breeding
Title: Long read isoform sequencing reveals hidden transcriptional complexity between cattle subspeciesAuthor
REN, YAN - University Of Adelaide | |
TSENG, ELIZABETH - Pacific Biosciences Inc | |
Smith, Timothy - Tim | |
HIENDLEDER, STEFAN - University Of Adelaide | |
WILLIAMS, JOHN - University Of Adelaide | |
LOW, WAI-YEE - University Of Adelaide |
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/27/2023 Publication Date: 3/13/2023 Citation: Ren, Y., Tseng, E., Smith, T.P.L., Hiendleder, S., Williams, J.L., Low, W. 2023. Long read isoform sequencing reveals hidden transcriptional complexity between cattle subspecies. BMC Genomics. 24. Article 108. https://doi.org/10.1186/s12864-023-09212-9. DOI: https://doi.org/10.1186/s12864-023-09212-9 Interpretive Summary: The use of short sequence reads representing bits of expressed RNA which reflect gene usage in a tissue has become routine. However, there are complications of using these fragments of the much longer RNA molecules (called transcripts) produced from genes, including the fact that each gene can produce multiple related but distinct transcripts. Using the short sequences to "re-create" the transcripts present in the cell has unavoidable errors. More recent technology is capable of sequencing the entire length of the transcript in one sequence read, eliminating the need for the prediction of transcript sequence from short reads and increasing the accuracy of transcript identification. An important issue that has not received substantial attention is whether these two approaches, called RNA-seq for short reads and Iso-seq for the long read approach, will provide equivalent information for asking questions about the level of gene expression within a tissue or identification of genes whose expression is different between two tissue samples. This manuscript directly compares the two approaches and determines that they do not have as high a correlation as one might hope. It also provides initial characterization to try and determine which approach has higher accuracy. Technical Abstract: The Iso-Seq method of full-length cDNA sequencing is suitable to quantify differentially expressed genes (DEGs), transcripts (DETs) and transcript usage (DTU). However, the higher cost of Iso-Seq relative to RNA-seq has limited the comparison of both methods. Transcript abundance estimated by RNA-seq and deep Iso-Seq data for fetal liver from two cattle subspecies were compared to evaluate concordance. Inter-sample correlation of gene- and transcript-level abundance was higher within technology than between technologies. Identification of DEGs between the cattle subspecies depended on sequencing method with only 44 genes identified by both that included 6 novel genes annotated by Iso-Seq. There was a pronounced difference between Iso-Seq and RNA-seq results at transcript-level wherein Iso-Seq revealed several magnitudes more transcript abundance and usage differences between subspecies. Factors influencing DEG identification included size selection during Iso-Seq library preparation, average transcript abundance, multi-mapping of RNA-seq reads to the reference genome, and overlapping coordinates of genes. Some DEGs called by RNA-seq alone appear to be sequence duplication artifacts. Among the 44 DEGs identified by both technologies some play a role in immune system, thyroid function and cell growth. Iso-Seq revealed hidden transcriptional complexity in DEGs, DETs and DTU genes between cattle subspecies previously missed by RNA-seq. |