Skip to main content
ARS Home » Research » Publications at this Location » Publication #206011

Title: Identification of conserved regulatory elements in upstream promoter regions of mammals at relaxed thresholds by comparative genomics - a case study using PEPCK

Author
item Liu, Ge - George
item WEIRAUCH, MATTHEW - UNIV OF CA SANTA CRUZ
item Van Tassell, Curtis - Curt
item Li, Robert
item Sonstegard, Tad
item MATUKUMALLI, LAKSHMI - GEORGE MASON UNIVERSITY
item Connor, Erin
item HANSON, RICHARD - CASE WESTERN UNIVERSITY
item YANG, JIANQI - CASE WESTERN UNIVERSITY

Submitted to: Genome Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/25/2007
Publication Date: 3/27/2009
Citation: Liu, G., Weirauch, M., Van Tassell, C.P., Li, R.W., Sonstegard, T.S., Matukumalli, L.K., Connor, E.E., Hanson, R.W., Yang, J. 2008. Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter. Genomics, Proteomics and Bioinformatics. 6(3-4):129-143.

Interpretive Summary: Comparative genomics is the primary method to discover regulatory elements by identifying conserved genetic sequences by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of agreement between in silico predictions and experimental results for most of TFBS, particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation. A detailed quality control and benchmarking of in silico predictions is currently missing. We designed a systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, we applied this approach to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays. This approach provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.

Technical Abstract: Background Comparative genomics is the primary method to discover regulatory elements by identifying conserved sequences due to evolutionary constraints by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of cross reference between the in silico predictions and experimental results for most of TFBS. Particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation, a detailed quality control and benchmarking of in silico predictions is currently missing. Results A systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), was implemented to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, this approach was applied to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a reasonable sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays. Conclusions This approach is featured with expandable TFBS matrix, adjustable threshold, and is compatible with the whole genome analysis. It provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.