Skip to main content
ARS Home » Plains Area » Clay Center, Nebraska » U.S. Meat Animal Research Center » Meat Safety and Quality » Research » Research Project #443683

Research Project: Enhanced Characterization of Sequence Differences Among Salmonella isolates within SNP Clusters Identified by the NCBI Pathogen Detection System

Location: Meat Safety and Quality

Project Number: 3040-42000-020-009-T
Project Type: Trust Fund Cooperative Agreement

Start Date: Mar 1, 2023
End Date: Jun 30, 2024

Objective:
(1) to develop methodology to catalogue Salmonella SNP cluster diversity in the National Center for Biotechnology Information’s (NCBI) Pathogen Detection Isolates Browser (PDIB) with the goal of producing a white paper to increase industry use and understanding of this powerful tool, and (2) to enhance public health actions and general understanding of Salmonella genomics by identifying target isolates for closed genome sequencing.

Approach:
We will use data from the NCBI PDIB to examine Salmonella SNP cluster genetic diversity. To capture the scope of Salmonella genetic diversity within the meat industry, we will conduct analyses focused on the three leading serotypes isolated from beef, turkey, swine, and poultry products as identified by previous studies: Montevideo, Dublin and Anatum for beef; Infantis, B:i:null, and Anatum for swine; Infantis, Kentucky, and Enteritidis for chicken; and Schwarzengrund, Uganda, and Infantis for turkey. Within each serotype-commodity group, we will select five SNP clusters which contain a closed genome to use as a reference. For each cluster, we will select the five closest-related isolates (with preference for isolates with 0 SNP differences) that were collected from distant geographic regions or timepoints for inclusion in a detailed comparative analysis. This design will result in the examination of 60 total clusters and 300 isolates. While the NCBI PDIB phylogenetic analysis is based on core genomic sequence, we will instead compare our selected isolates against the closed genome sequence (and any associated plasmid sequences) from the cluster. This will allow for identification of additional sequence complexity, including bacteriophages, genomic islands, and other indels which the NCBI PDIB analysis ignores, and will provide for a clearer picture of the true diversity within each SNP cluster. Under current frameworks, an analysis of this size would command a high computational cost and significant amount of analysis by trained scientists. Funding of this proposal will allow researchers to develop automated, machine learning-based approaches to support and accelerate detailed SNP diversity analyses typically done by hand.