Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Genetic Improvement for Fruits & Vegetables Laboratory » Research » Publications at this Location » Publication #380164

Title: A ribosomal operon database and megablast settings for strain-level resolution of microbiomes

Author
item KERKHOF, LEE - Rutgers University
item ROTH, PIERCE - Dcs Corp
item DESHPANDE, SAMIR - Us Army, Ccdc Chemical Biological Center
item BERNHARDS, CORY - Us Army, Ccdc Chemical Biological Center
item LIEM, ALVIN - Dcs Corp
item HILL, JESSICA - Dcs Corp
item HAGGBLOM, MAX - Rutgers University
item WEBSTER, NICOLE - Australian Inst Of Marine Sciences
item IBIRONKE, OLUFUNMILOLA - Rutgers University
item MIRZOYAN, SEDA - Rutgers University
item Polashock, James
item SULLIVAN, RAYMOND - Joint Program Executive Office For Chemical, Biological, Radiological And Nuclear Defense

Submitted to: FEMS Microbes
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/14/2022
Publication Date: 1/27/2022
Citation: Kerkhof, L.J., Roth, P.A., Deshpande, S.V., Bernhards, C.R., Liem, A.T., Hill, J.M., Haggblom, M.M., Webster, N.S., Ibironke, O., Mirzoyan, S., Polashock, J.J., Sullivan, R.F. 2022. A ribosomal operon database and megablast settings for strain-level resolution of microbiomes. FEMS Microbes. https://doi.org/10.1093/femsmc/xtac002.
DOI: https://doi.org/10.1093/femsmc/xtac002

Interpretive Summary: Current methods to characterize communities of microorganisms in any environment, generally employ analysis of tiny pieces of DNA that are analyzed from the samples. Those tiny pieces of DNA are then compared to those in public databases in an effort to identify the microorganisms from which they originated. Although this method works for identification at the species level of taxonomy, it is inadequate for distinguishing differences between strains within a species. Identification of strain is very important as variation can determine organism function. For example, some strains of bacteria are completely harmless to healthy humans, while others in the same species can cause disease. In an effort to facilitate strain-level identification of microbes, we created a publicly available curated database based on much longer pieces of DNA than those found in other databases. We demonstrated that the database has strain-level resolution from environmental samples as well as simulated microbial communities. In addition, data can be obtained using portable equipment, thus allowing utility in the field. This database is expected to grow over time and will be extremely useful to research scientists, the medical community, and others that require rapid determination of microbial taxonomy to the strain level.

Technical Abstract: Current methods to characterize microbial communities generally employ sequencing of select variable regions of the 16S rRNA gene using short reads (< 500 bp) with high sequence accuracy (~ 99%), but affording only limited phylogenetic resolution. One solution is long-read sequencing which allows for the profiling of entire ribosomal operons [16S-ITS-23S rRNA genes] to characterize the microbiome. Unfortunately, the creation of rRNA operon databases has lagged as compared to the 16S, 23S, and ITS regions alone. Here, we describe an rRNA operon database with >300,000 entries, representing >10,000 prokaryotic species and ~150,000 strains. BLAST analysis parameters were identified for strain-level resolution using in silico mutated mock rRNA operon sequences (70–95% identity) from 4 bacterial phyla and 2 members of the Euryarchaeota. Performance testing on a Mac Mini took <0.5–37 seconds per read and strain-level resolution at > 73% identity for all mutated sequences. However, a typical MinION library contains ~50,000 reads and most settings were unmanageable for practical analysis in the field. Therefore, BLAST settings were optimized requiring < 3 secs per read with strain-level resolution for sequences with >84% identity. These settings were tested on sequence libraries generated from a diverse set of habitats: the human respiratory tract, farm/forest soils, and marine sponges (n= 1,322,818 reads). Most reads in this data set yielded best BLAST hits (95 ± 8%). However, only 38–82% of library reads were compatible with strain-level resolution, reflecting the dominance of human/biomedical associated prokaryotic entries in the database. Since the MinION and the Mac Mini are both portable, this study demonstrates the possibility of strain-level microbiome analysis in the field.