Skip to main content
ARS Home » Plains Area » Clay Center, Nebraska » U.S. Meat Animal Research Center » Genetics and Animal Breeding » Research » Publications at this Location » Publication #376080

Research Project: Developing a Systems Biology Approach to Enhance Efficiency and Sustainability of Beef and Lamb Production

Location: Genetics and Animal Breeding

Title: Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1.0)

Author
item SALAVATI, MAZDAK - Roslin Institute
item CAULTON, ALEX - Agresearch
item CLARK, RICHARD - University Of Edinburgh
item GAZOVA, IVETA - Roslin Institute
item Smith, Timothy - Tim
item WORLEY, KIM - Baylor College Of Medicine
item COCKETT, NOELLE - Utah State University
item ARCHIBALD, ALAN - Roslin Institute
item CLARKE, SHANNON - Agresearch
item MURDOCH, BRENDA - University Of Idaho
item CLARK, EMILY - Roslin Institute

Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/9/2020
Publication Date: 10/23/2020
Citation: Salavati, M., Caulton, A., Clark, R., Gazova, I., Smith, T.P.L., Worley, K.C., Cockett, N.E., Archibald, A.L., Clarke, S., Murdoch, B.M., Clark, E.L. 2020. Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1.0). Frontiers in Genetics. Article e580580. https://doi.org/10.3389/fgene.2020.580580.
DOI: https://doi.org/10.3389/fgene.2020.580580

Interpretive Summary: The Ovine "Functional Annotation of Animal Genomes" (FAANG) consortium has the goal to identify the regions of the genome that affect the expression of genes. Many of these regions, known as "control elements", are tied to segments of DNA close to where the DNA begins conversion to RNA for each gene, called "transcription start sites" (TSS). We therefore need to know where the TSS lie in the genome in order to correctly identify control elements. This work uses a technique called CAGE that "captures" the TSS by finding the ends of the RNA made from each gene, and matching it to the genome. Our CAGE analysis identified nearly 30,000 high-confidence TSS in the ovine genome, providing a critical resource for identification of genetic elements controlling gene expression in sheep tissues.

Technical Abstract: The overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (Oar rambouillet v1.0). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616 we have performed a global analysis of TSS and TSS-Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5’ cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with < 10 read counts, 39.3% of TSS overlapped with 5’ ends of transcripts, as annotated previously by NCBI. A further 14.7% mapped to within 50bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions ('50bp) revealed 46% of the predicted TSS were ‘novel’ and previously un-annotated. Using whole genome bisulphite sequencing data from the same tissues we were able to determine that a proportion of these ‘novel’ TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than ‘noise’. This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date.