Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Produce Safety and Microbiology Research » Research » Publications at this Location » Publication #334050

Research Project: Ecology and Detection of Human Pathogens in the Produce Production Continuum

Location: Produce Safety and Microbiology Research

Title: imGLAD: accurate detection and quantification of target organisms in metagenomes

Author
item CASTRO, JUAN - Georgia Institute Of Technology
item RODRIGUEZ, LUIS - Georgia Institute Of Technology
item HARVEY, WILLIAM - Georgia Institute Of Technology
item WEIGLAND, MICHAEL - Georgia Institute Of Technology
item HATT, JANET - Georgia Institute Of Technology
item Carter, Michelle
item KONSTANTINIDIS, KOSTAS - Georgia Institute Of Technology

Submitted to: PeerJ
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/3/2018
Publication Date: 11/2/2018
Citation: Castro, J.C., Rodriguez, L.M., Harvey, W.T., Weigland, M.R., Hatt, J.K., Carter, M.Q., Konstantinidis, K. 2018. imGLAD: accurate detection and quantification of target organisms in metagenomes. PeerJ. 6:e5882. https://doi.org/10.7717/peerj.5882.
DOI: https://doi.org/10.7717/peerj.5882

Interpretive Summary: Bacterial pathogens are rapidly evolving, resulting in emergence and reemergence of novel foodborne pathogens. This poses significant challenges for public health surveillance because traditional diagnostic testing replies largely on the cultivation of pathogens, or known genetic markers for molecular detection. Such limitations can be overcome if the testing samples are directly sequenced and screened for any known human pathogens. The biggest challenge in this metagenomics-based method is mining large amounts of sequencing data to extract useful information. In this study, we developed a bioinformatics tool, imGLAD, for rapid analysis of metagenomes derived from environmental samples. imGLAD is based on mapping sequence reads against reference genomes and subsequently calculating the likelihood of the presence of testing genomes based on selected and validated informative features. We tested this tool with two experimental systems: one detects enteric pathogen E. coli O157:H7 from metagenomes retrieved from field-grown spinach leaves; the other detects genomes of Bacillus anthracis from a soil sample. We found that imGLAD achieved high accuracy and provided abundance estimates close to the expected ones in both testing systems. The detection limit was estimated in the order of 100 cells based on 1-2 Gb of Illumina shotgun sequencing in the microbiome recovered from 100 grams of field-grown spinach leaves, which was comparable to PCR-based approaches. However, this metagenomics-based detection method yielded much more information compared to PCR-based detection, including the structure and composition of the microbial community associated with the pathogen as well as the overall pathogenicity potential of the testing sample. Considering that bacterial pathogenicity is a combinational effect of multiple factors, such information can be used not only for epidemiological investigations but also for risk assessments.

Technical Abstract: The emergence of novel foodborne pathogens poses significant challenges for public health surveillance as diagnostic testing has historically relied on culture-based methods, leaving most cases of foodborne illness unlinked to a specific causative agent. Furthermore, innocuous organisms are often positive for typical culture-dependent and culture-independent diagnostic tests, potentially confounding the accurate detection of closely related pathogens. Metagenomics-based diagnostic tests hold the potential to capture phylogenetic and functional diversity in its entirety without the requirement for isolation or cultivation of target organisms. However, such methods have not been applied in epidemiological studies due to a lack of standardized bioinformatics techniques for data analysis. In this study, we introduce a standardized framework, imGLAD, which addresses these limitations and allows the reliable detection of target organisms in environmental samples. imGLAD is based on mapping sequence reads against a reference genome and subsequently calculating the likelihood that the genome is present based on logistic feature classification. imGLAD achieves higher accuracy compared to other tools for the same purposes because of using the sequence-discrete population concept for deciding matching reads, masking regions of the genome that are not informative using the MyTaxa pipeline, and modeling both the sequencing breadth and depth to determine relative abundance and detection limit. We validated imGLAD by analyzing metagenomes derived from spinach leaves inoculated with human pathogen Escherichia coli O157:H7 (target organism), as well as soil samples spiked with genomic DNA of Bacillus anthracis (target genome). imGLAD provided abundance estimates close to the expected ones in both cases. The detection limit was estimated in the order of 100 cells based on 1-2Gb of Illumina shotgun sequencing in microbiome recovered from 100 grams of field-grown spinach leaves. This detection limit is comparable to PCR-based approaches. However, this metagenomics-based method has provided additional information about the structure and composition of the microbial community associated with the target organism that can be immediately incorporated in epidemiological investigations and risk assessments.