Location: Characterization and Interventions for Foodborne Pathogens
Title: Evaluation of long-read sequencing simulators to assess real-world applications for food safetyAuthor
Submitted to: Foods
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 12/19/2023 Publication Date: 12/19/2023 Citation: Counihan, K.L., Kanrar, S., Tilman, S.M., Gehring, A.G. 2023. Evaluation of long-read sequencing simulators to assess real-world applications for food safety. Foods. 13(1):16. https://doi.org/10.3390/foods13010016. DOI: https://doi.org/10.3390/foods13010016 Interpretive Summary: Shiga toxin-producing Escherichia coli (STEC) and Listeria monocytogenes (L. mono) are two types of bacteria that can be found in meat and cause severe illness if eaten. Current methods to test for these bacteria require at least four days to identify STEC and six days for L. mono. Long-read, whole genome sequencing is a new technology that can determine DNA sequences present in a meat sample. If any DNA from disease-causing bacteria is present, it can be identified within hours. Sequencing could significantly reduce the time needed for identification, but method development costs are high. Therefore, the goal of this project was to use computer simulations to determine if sequencing would be a practical method. The results suggested that sequencing could be used to detect disease-causing bacteria in meat after the bacteria had 12-24 hours to grow prior to sequencing. The total testing time needed would be less than 48 hours, a significant decrease from the current methods, which would allow products to arrive at market faster and reduce spoilage. Additionally, the results from the computer simulations will reduce the time and expense associated with laboratory experimentation to develop sequencing methods. Interpretive Summary: Background Shiga toxin-producing Escherichia coli (STEC) and Listeria monocytogenes are responsible for severe foodborne illnesses in the United States. Current identification methods require at least four days to identify STEC and six days for L. monocytogenes. Adoption of long-read, whole genome sequencing for testing could significantly reduce the time needed for identification, but method development costs are high. Therefore, the goal of this project was to use NanoSim-H software to simulate Oxford Nanopore sequencing reads to assess the feasibility of sequencing-based foodborne pathogen detection and guide experimental design. Results Sequencing reads were simulated for STEC, L. monocytogenes, and a 1:1 combination of STEC and Bos taurus genomes using NanoSim-H. At least 2,500 simulated reads were needed to identify the seven genes of interest targeted in STEC, and at least 500 reads were needed to detect the gene targeted in L. monocytogenes. Genome coverage of 30x was estimated at 21,521 and 11,802 reads for STEC and L. monocytogenes, respectively. Approximately 5 – 6% of reads simulated from both bacteria did not align to their respective reference genomes due to the introduction of errors. For the STEC and B. taurus 1:1 genome mixture, all genes of interest were detected with 1,000,000 reads, but less than 1x coverage was obtained. Conclusions The results suggested sample enrichment would be necessary to detect foodborne pathogens with long-read sequencing, but this would still decrease the time needed from current methods. Additionally, simulation data will be useful for reducing the time and expense associated with laboratory experimentation. Technical Abstract: Background Shiga toxin-producing Escherichia coli (STEC) and Listeria monocytogenes are responsible for severe foodborne illnesses in the United States. Current identification methods require at least four days to identify STEC and six days for L. monocytogenes. Adoption of long-read, whole genome sequencing for testing could significantly reduce the time needed for identification, but method development costs are high. Therefore, the goal of this project was to use NanoSim-H software to simulate Oxford Nanopore sequencing reads to assess the feasibility of sequencing-based foodborne pathogen detection and guide experimental design. Results Sequencing reads were simulated for STEC, L. monocytogenes, and a 1:1 combination of STEC and Bos taurus genomes using NanoSim-H. At least 2,500 simulated reads were needed to identify the seven genes of interest targeted in STEC, and at least 500 reads were needed to detect the gene targeted in L. monocytogenes. Genome coverage of 30x was estimated at 21,521 and 11,802 reads for STEC and L. monocytogenes, respectively. Approximately 5 – 6% of reads simulated from both bacteria did not align to their respective reference genomes due to the introduction of errors. For the STEC and B. taurus 1:1 genome mixture, all genes of interest were detected with 1,000,000 reads, but less than 1x coverage was obtained. Conclusions The results suggested sample enrichment would be necessary to detect foodborne pathogens with long-read sequencing, but this would still decrease the time needed from current methods. Additionally, simulation data will be useful for reducing the time and expense associated with laboratory experimentation. |