Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Emerging Pests and Pathogens Research » Research » Research Project #432869

Research Project: Development of Tools, Models and Datasets for Genome-enabled Studies of Bacterial Phytopathogens

Location: Emerging Pests and Pathogens Research

2017 Annual Report


Objectives
Objective 1: Develop datasets and computational tools to facilitate the study of large-scale genomic and pan-genomic features of plant-associated bacteria, including genomic islands and virulence pathways. [NP303, C2, PS2A] Subobjective 1A: Develop deep proteogenomic data sets to guide the annotation of poorly characterized type strains and field isolates of select strains of bacterial plant pathogens and other plant-associated bacteria. Subobjective 1B: Develop or refine annotation methods for genomic regions of anomalous nucleotide composition and the systems-level analysis of pathways related to virulence and adaptation to plant-associated niches. Objective 2: Identify genes and candidate transcription factor binding sites using comparative genomics and available CHIP-Seq, RNA-Seq and proteomics data sets, and ensure that gene calls include experimental evidence whenever appropriate. [NP303, C2, PS2A] Subobjective 2A: Extend comparative genomics methods to propagate the experimentally-supported genome annotate updates from targeted bacterial strains to related strains. Subobjective 2B: Leverage proteomics and other high-throughput datasets, along with comparative genomics methods, to identify conserved motifs representing candidate promoters and other regulatory binding sites.


Approach
A good genome annotation includes a complete set of biological components (e.g., coding and non-coding genes) and a description of the interactions between them (e.g., promoters and bind- ing sites for transcriptional regulators). Constructing this level of detail relies on painstaking ex- perimental investigations on individual genes and their regulation – a luxury enjoyed by a small handful of model organisms such as Escherichia coli, Pseudomonas aeruginosa, and Bacillus subtilus. The goal of this project is to use proteomics and other evidence based computational analysis to rapidly produce high-quality bacterial genome annotations that can be used by biologists to design experiments and interpret experimental results. Our primary goal is to develop high-quality genomic resources for field isolates currently causing disease outbreaks including Clavibacter michiganensis, Pantoea ananatis, Xylella fastidiosa, and Dickeya species. In addition, we will use existing and novel computation methods to establish pipelines for prop- agating our experimentally-driven genome annotations to other members of their clades, with special emphasis on pathways related to virulence and fitness. This work will be conducted in collaboration with the prokaryotic genome annotation pipeline (PGAP) team at the National Cen- ter for Biotechnology Information (NCBI). In this manner, the improvements to a small number of genomes will be result in improvements to literally thousands of genome annotations. Both of these objectives build on our prior experience leading experimental and computational efforts to develop genomic resources for P. syringae pv. tomato DC3000.


Progress Report
Work this year focused on establishing collaborations with scientists from the Ithaca ARS location, the Ft. Detrick ARS location, Cornell University, and Boyce Thompson Institute and developing proteomics data sets that will be used for proteogenomic analysis of agriculturally relevant phytopathogenic bacteria strains. Working with our collaborators, we improved our methods for sample preparation, protein extraction, and data analysis of spectra from proteomics experiments. We also optimized methods to separate bacterial cell associated proteins from extracellular proteins in suspension. The method appears to be effective for both Gram-positive and -negative bacteria. An efficient method was developed for extracting and concentrating bacterial extracellular proteins to a level where it could be prepared for mass spectrometry analysis. An initial method relied on many centrifugation runs of small amounts of liquid to extract miniscule amounts of protein. This method, although effective, was judged too time consuming. Special ultra high pressure filtration equipment was purchased, which enables larger volumes of liquid to be processed in a much shorter time. The time needed to concentrate the supernatant was reduced by 80%. Scientists were trained on the use of software used to analyze all of the mass spectrometry data sets. With access and familiarity with the software, we re-analyzed some of the mass spectrometry data sets with varying parameters (e.g., different tolerances, looking for different post-translational modifications). This has enabled us to identify weaknesses in our proteogenomic analysis methods, which we have addressed. Ultra high-quality proteomics data sets were collected for Clavibacter michiganensis subsp. michiganensis (Cmm), strain NCPPB382, the causative agent of bacterial wilt and canker in tomatoes and the type strain of Cmm. Pellet and supernatant were processed and analyzed. The proteomics data sets were deposited at the PRIDE (ProteomExchange) repository and are publicly available (accession #PXD006787). We have developed a software pipeline to process the proteomics data sets and discover novel and misannotated genes. For example, when analyzing the NCPPB382 data set, we discovered 34 previously unknown and unannotated genes and 82 genes whose annotations require revision. A number of these genes are found in a known pathogenicity island and may be involved with virulence. A manuscript is being prepared describing this data set and the results of its analysis. Ultra high-quality proteomics data sets were collected for Pseudomonas syringae pv. tomato DC3000, the causative agent of bacterial speck in tomatoes and an important model organism. We are in the process of developing an analysis pipeline built using only open source components, and these samples will be fully analyzed with this new pipeline. A manuscript describing this data set is expected in the next reporting period. An ultra high-quality proteomics data set was collected for Pantoea ananatis LMG20103, the causative agent of center rot in onions. The effectiveness of the protein preparation method was evaluated and additional samples are being prepared and will be processed shortly. A manuscript describing this data set is expected in a future reporting period. We worked to improve the quality of the draft genome for Pseudomonas syringae pv. tomato NY15025, an emerging and particularly virulent strain causing bacterial speck of tomatoes in New York. We facilitated the processing of additional sequencing libraries to provide additional depth of coverage for a genome assembly pipeline. The new version of the genome has fewer contigs and fewer assembly artifacts. Ultra high-quality proteomics data sets were collected for NY15025, and will be analyzed using the improved draft genome. Xylella fastidiosa causes a number of plant diseases including Pierce’s disease, citrus variegated chlorosis disease, and olive quick decline syndrome. Projects for developing high-quality genomes for important Xylella strains, using conventional sequencing and proteogenomics methods were established. Difficulties were discovered with the viability of colonies of several of the strains and growing other strains. In the next reporting period, we plan to work to address these difficulties. We have started a project to develop genomes for important strains of Pseudomonas viridiflava, which is an emerging pathogens of onion. We participated in the workshop on the Prokaryotic Genome Annotation Pipeline (PGAP) held by National Center for Biotechnology Information (NCBI) on the campus of the National Institute of Health. At this workshop and in separate conference calls, we discussed with the PGAP developers how we might share with them our genome annotation results and raw proteomics data.


Accomplishments


Review Publications
Pineros, M., Larson, B., Shaff, J., Schneider, D.J., Falcao, A., Lixing, Y., Clark, R.T., Craft, E.J., Davis, T.W., Pradier, P., Liu, J., Assaranurak, I., Susan, M., Sturrock, C., Bennett, M., Kochian, L.V. 2016. Advances and considerations in technologies for growing, imaging, and analyzing 3-D root system architecture. Journal of Integrative Plant Biology. 58(3):230-241.
Markel, E.J., Stodghill, P., Bao, Z., Myers, C., Swingle, B.M. 2016. AlgU controls expression of virulence genes in Pseudomonas syringae pv. tomato DC3000. Journal of Bacteriology. 198(17):2330-2344.
Butcher, B., Bao, Z., Wilson, J., Stodghill, P., Swingle, B.M., Filiatrault, M.J., Schneider, D.J., Cartinhour, S.W. 2017. The ECF sigma factor, PSPTO_1043, in Pseudomonas syringae pv. tomato DC3000 is induced by oxidative stress and regulates genes involved in oxidative stress response. PLoS One. DOI: 10.1371/journal.pone.0180340.