Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Environmental Microbial & Food Safety Laboratory » Research » Publications at this Location » Publication #391311

Research Project: Intervention Strategies to Mitigate the Food Safety Risks Associated with the Fresh Produce Supply Chain

Location: Environmental Microbial & Food Safety Laboratory

Title: Foodborne salmonellosis outbreak severity prediction based on genetic and meteorological trends using machine learning

Author
item KARANTH, SHRADDHA - University Of Maryland
item Patel, Jitu
item SHIRMOHAMMADI, ADEL - University Of Maryland
item PRADHAN, ABANI - University Of Maryland

Submitted to: International Association for Food Protection
Publication Type: Abstract Only
Publication Acceptance Date: 3/21/2022
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: Introduction: Several studies have shown a correlation between outbreaks of Salmonella enterica and meteorological trends, especially related to temperature and precipitation. Additionally, current outbreak-related studies are performed on data pooled by Salmonella species without taking into account its intra-species and genetic heterogeneity. Purpose: The purpose of this study was to analyze and quantify the effect of differential gene expression and a suite of meteorological factors on salmonellosis outbreak severity (typified by case numbers) using a combination of machine learning and count modeling methods. Methods: Salmonella outbreak data and corresponding meteorological data were obtained from the National Outbreak Reporting System and National Climactic Data Center databases. Salmonella whole genome sequences obtained from the Pathogen Detection database were employed in a pan-genome creation. Elastic Net regularization was used to identify significant genes, and a multi-variable Poisson regression was developed to fit the individual and mixed effects data. Best-fit models were identified using pseudo R2 values and significance was set at p < 0.05. Results: The best-fit Elastic Net model (a = 0.5000; ' = 2.18399) identified 53 significant gene features. The final multi-variable Poisson regression model ('2 = 5748.22; pseudo R2 = 0.6688; probability > '2 = 0.0000) identified 127 significant predictor terms (p < 0.10), comprising 45 gene-only predictors, average temperature, average precipitation, and snow cover, and 79 gene-meteorological interaction terms. The significant genes ranged in functionality from cellular signaling and transport, virulence, metabolism, and stress response, and included gene variables not considered as significant by the baseline model. Significance: The results of this study indicate the need to co-evaluate genomic data with environmental data to develop a more holistic model to predict disease outcome severity, which could extend to re-evaluating the risk to human health.