Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Emerging Pests and Pathogens Research » Research » Research Project #432869

Research Project: Development of Tools, Models and Datasets for Genome-enabled Studies of Bacterial Phytopathogens

Location: Emerging Pests and Pathogens Research

2020 Annual Report


Objectives
Objective 1: Develop datasets and computational tools to facilitate the study of large-scale genomic and pan-genomic features of plant-associated bacteria, including genomic islands and virulence pathways. [NP303, C2, PS2A] Subobjective 1A: Develop deep proteogenomic data sets to guide the annotation of poorly characterized type strains and field isolates of select strains of bacterial plant pathogens and other plant-associated bacteria. Subobjective 1B: Develop or refine annotation methods for genomic regions of anomalous nucleotide composition and the systems-level analysis of pathways related to virulence and adaptation to plant-associated niches. Objective 2: Identify genes and candidate transcription factor binding sites using comparative genomics and available CHIP-Seq, RNA-Seq and proteomics data sets, and ensure that gene calls include experimental evidence whenever appropriate. [NP303, C2, PS2A] Subobjective 2A: Extend comparative genomics methods to propagate the experimentally-supported genome annotate updates from targeted bacterial strains to related strains. Subobjective 2B: Leverage proteomics and other high-throughput datasets, along with comparative genomics methods, to identify conserved motifs representing candidate promoters and other regulatory binding sites.


Approach
A good genome annotation includes a complete set of biological components (e.g., coding and non-coding genes) and a description of the interactions between them (e.g., promoters and bind- ing sites for transcriptional regulators). Constructing this level of detail relies on painstaking ex- perimental investigations on individual genes and their regulation – a luxury enjoyed by a small handful of model organisms such as Escherichia coli, Pseudomonas aeruginosa, and Bacillus subtilus. The goal of this project is to use proteomics and other evidence based computational analysis to rapidly produce high-quality bacterial genome annotations that can be used by biologists to design experiments and interpret experimental results. Our primary goal is to develop high-quality genomic resources for field isolates currently causing disease outbreaks including Clavibacter michiganensis, Pantoea ananatis, Xylella fastidiosa, and Dickeya species. In addition, we will use existing and novel computation methods to establish pipelines for prop- agating our experimentally-driven genome annotations to other members of their clades, with special emphasis on pathways related to virulence and fitness. This work will be conducted in collaboration with the prokaryotic genome annotation pipeline (PGAP) team at the National Cen- ter for Biotechnology Information (NCBI). In this manner, the improvements to a small number of genomes will be result in improvements to literally thousands of genome annotations. Both of these objectives build on our prior experience leading experimental and computational efforts to develop genomic resources for P. syringae pv. tomato DC3000.


Progress Report
This is the report for the project 8062-22000-043-00D which terminated in February 2020 and is now under project 8062-21000-042-00D. For additional information, see the 8062-21000-042-00D project report. The objectives of this project were: (1) to identify important strains of phytopathogenic bacteria, develop and refine their genome sequences and annotations using high-throughput methods (e.g., genomics, proteomics, transcriptomics), and (2) working with NCBI, leverage these high-quality genomes to improve the annotations of other closely related strains. Because of a vacancy and the delay in hiring, the second objective could not be pursued. During the first two reporting periods, the project scientists developed collaborations with scientists both inside and outside the Agency who specialized in particular phytopathogenic bacteria. A working relationship with other ARS researchers in Ithaca, New York, who specialize in protein chemistry and proteomics was also established. These collaborations resulted in the fairly rapid development of proteomics datasets for Clavibacter michiganensis, which causes bacterial canker in tomatoes, Pantoea ananatis, which causes center rot in onions, and Pseudomonas syringae pv. tomato, which causes bacterial speck in tomatoes. A manuscript was published on describing some of these datasets, the software pipelines developed for analyzing these datasets, genome annotation changes supported by these datasets, and the biological significance of these changes. Datasets were also developed for Xylella fastidiosa, which causes Pierce’s disease in grapes, and Liberibacter crescens. Methods developed for preparing samples for proteomics analysis worked very well, except for those cases where growing specimens to sufficient concentration proved difficult (e.g., X. fastidiosa), To identify secreted proteins, attempts were made to develop separate datasets for bacterial pellets and supernatants, however, a robust molecular and/or computational method for distinguishing between secreted proteins and lysis proved unsatisfactory, so this approach was abandoned. We performed a complete proteogenomic analysis of Clavibacter michiganensis michiganensis. Under the tested conditions, our analysis was able to call “present” approximately 70% of the annotated proteins from this organism. Fifty nine existing gene annotations were found to be too short and needed to be updated. In addition, 26 unannotated proteins were discovered, some of which appear in a genomic region known to be involved with disease progression and others that are predicted to be membrane-bound and may be involved with survival in the plant environment. The expression of the novel genes was confirmed using an independent method (PCR). These results contribute to the understanding of this important pathogen of a number of specialty crops and demonstrates the basic utility and viability of the proteogenomics approach. Since a high-quality genome sequence is essential for any proteogenomics analysis, and since high-quality genome sequences do not exist for many important bacterial strains, we spent considerable effort mastering and refining techniques for quickly developing genomes from hybrid assembly methods, which produce relatively error-free and closed genomes by combining so-called long- and short-read datasets. For the short-reads, we have used Illumina NextSeq and MiSeq instruments in Cornell’s core facilities. For the long-reads, we have used PacBio instruments. We have also invested in a Nanopore MinION instrument, so that we are able to perform these sequencing reactions in our own laboratory. At this point, genome sequencing of culturable bacteria is an established standard method, which we plan to leverage in future research. To date we have produced high-quality genome sequences and annotations for strains of Pantoea agglomerans and Pantoea ananatis that are pathogenic to onions. Since joining the ‘042D project, we have collaborated on the sequencing of a number of strains of Pectobacterium, Dickeya, and Leifsonia. Working with university collaborators, we have studied the so-called “HiVir” genes in Pantoea. This set of genes have been shown to be essential for Pantoea ananatis to cause disease symptoms in onions. These genes appear to code for a pathway to produce a phosphonates, but the exact identity and function of product has not been established. Proteomics evidence suggests that a number of the genes in this region produce proteins, but it not clear from this evidence that all of the genes are transcribed and translated under the testing conditions or that the entire pathway is operational. In this reporting period, we have isolated RNA from Panteoa strains in order to determine reliable conditions under which all of the HiVir genes are expressed. We have also determined, through genomic sequencing, that intact HiVir regions can be found in certain strains of Pantoea agglomerans. This finding suggests that the presence of HiVir in other onion pathogenic species of Pantoea is more extensive than previously thought. Work continues under the ‘042D project to elucidate the extent, regulation, and function of HiVir in onion pathogenic strains of Pantoea. One of the shortcomings of a proteogenomics analysis based on bottom-up, or shotgun, proteomics, is that it is not able to unambiguously identify the N-terminal ends of proteins. In the bottom-up approach, proteins are digested into shorter oligopeptides using, for instance, trypsin. When a peptide is observed upstream of the N-terminal end of an annotated protein, it can be surmised that the annotation is too short, but it is not always possible to say how long the annotation should be. The same is true with determining the N-terminal of novel proteins. An alternative to bottom-up proteomics is top-down proteomics, in which mass spec analysis is performed on undigested proteins. When a protein sequence is identified using the top-down approach, both the N- and C- terminals of that particular sample are unambiguously identified as well. Top-down proteomics requires a specialized mass spec instrument that has very high resolution and is capable of ionizing relatively large molecules (i.e., intact proteins). Fortunately, Cornell recently acquired such a machine, and, because of our interest and expertise with proteogenomics, we were given early access to the instrument to perform several of the first experiments. Working with ARS and university collaborators, we have developed methods for isolating bacterial protein samples, separating the sample into a number of subsamples, in order to reduce the overall complexity of each subsample that is analyzed by the mass spec instrument, and software pipelines for analyzing and interpreting the mass spec results. At the present time we have performed a preliminary analysis of the Dickeya dadantii 3937 proteome and demonstrated that it is possible to use this method to unambiguously identify the N-terminal of a number of proteins. Furthermore, we have used this dataset of identify a number of post-translational modifications of the Dickeya proteins that were not, to our knowledge, previously described. Under the 8062-21000-042-00D project, we will continue this work. The Dickeya results are small and preliminary, and our goal is to provide coverage using top-down methods for as much of the Dickeya proteome as is possible with bottom-up methods. To this end, we are currently working on a bottom-up analysis of the Dickeya proteome. This dataset will be useful on its own for a proteogenomics analysis of this strain; it will also be useful for comparing the strengths and weaknesses of the two approaches. We also plan to complete an RNA-Seq analysis of the Dickeya strain under the same conditions used for the proteomics analyses. Having this dataset will enable us to develop “multi-omics” methods and datasets for this important potato pathogen.


Accomplishments


Review Publications
Liu, Y., Ma, X., Helmann, T.C., Mclane, H., Stodghill, P., Swingle, B.M., Filiatrault, M.J., Perry, K.L. 2020. Complete genome sequence of a gram positive bacterium Leifsonia sp. strain PS1209, a potato endophyte. Microbiology Resource Announcements. 9(26). https://doi.org/10.1128/MRA.00447-20.
Swingle, B.M., Perna, N.T., Glasner, J.D., Hao, J., Johnson, S., Charkowski, A., Perry, K.L., Stodghill, P. 2019. Complete genome sequence of the potato blackleg pathogen Dickeya dianthicola ME23. Microbiology Resource Announcements. 8(7). https://doi.org/10.1128/MRA.01526-18.