Location: Emerging Pests and Pathogens Research
Project Number: 8062-22000-043-000-D
Project Type: In-House Appropriated
Start Date: May 9, 2017
End Date: Feb 18, 2020
Objective:
Objective 1: Develop datasets and computational tools to facilitate the study of
large-scale genomic and pan-genomic features of plant-associated bacteria, including genomic islands and virulence pathways. [NP303, C2, PS2A]
Subobjective 1A: Develop deep proteogenomic data sets to guide the annotation of poorly characterized type strains and field isolates of select strains of bacterial plant pathogens and other plant-associated bacteria.
Subobjective 1B: Develop or refine annotation methods for genomic regions of anomalous nucleotide composition and the systems-level analysis of pathways related to virulence and adaptation to plant-associated niches.
Objective 2: Identify genes and candidate transcription factor binding sites using comparative genomics and available CHIP-Seq, RNA-Seq and proteomics data sets, and ensure that gene calls include experimental evidence whenever appropriate. [NP303, C2, PS2A]
Subobjective 2A: Extend comparative genomics methods to propagate the experimentally-supported genome annotate updates from targeted bacterial strains to related strains.
Subobjective 2B: Leverage proteomics and other high-throughput datasets, along with comparative genomics methods, to identify conserved motifs representing candidate promoters and other regulatory binding sites.
Approach:
A good genome annotation includes a complete set of biological components (e.g., coding and non-coding genes) and a description of the interactions between them (e.g., promoters and bind- ing sites for transcriptional regulators). Constructing this level of detail relies on painstaking ex- perimental investigations on individual genes and their regulation – a luxury enjoyed by a small handful of model organisms such as Escherichia coli, Pseudomonas aeruginosa, and Bacillus subtilus.
The goal of this project is to use proteomics and other evidence based computational analysis to rapidly produce high-quality bacterial genome annotations that can be used by biologists to design experiments and interpret experimental results. Our primary goal is to develop high-quality genomic resources for field isolates currently causing disease outbreaks including Clavibacter michiganensis, Pantoea ananatis, Xylella fastidiosa, and Dickeya species.
In addition, we will use existing and novel computation methods to establish pipelines for prop- agating our experimentally-driven genome annotations to other members of their clades, with special emphasis on pathways related to virulence and fitness. This work will be conducted in collaboration with the prokaryotic genome annotation pipeline (PGAP) team at the National Cen- ter for Biotechnology Information (NCBI). In this manner, the improvements to a small number of genomes will be result in improvements to literally thousands of genome annotations.
Both of these objectives build on our prior experience leading experimental and computational efforts to develop genomic resources for P. syringae pv. tomato DC3000.