Location: Genetics and Animal Breeding
Title: Classification of 16S rRNA reads is improved using a niche-specific database constructed by near-full length sequencingAuthor
MYER, PHILLIP - University Of Tennessee | |
McDaneld, Tara | |
Kuehn, Larry | |
DEDONDER, KEITH - Kansas State University | |
APLEY, MICHAEL - Kansas State University | |
CAPIK, SARAH - Kansas State University | |
LUBBERS, BRIAN - Kansas State University | |
Harhay, Gregory | |
Harhay, Dayna | |
Keele, John | |
HENNIGER, MADISON - University Of Tennessee | |
CLEMMONS, BROOKE - University Of Tennessee | |
Smith, Timothy - Tim |
Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 6/17/2020 Publication Date: 7/13/2020 Citation: Myer, P.R., McDaneld, T.G., Kuehn, L.A., Dedonder, K.D., Apley, M.D., Capik, S.F., Lubbers, B.V., Harhay, G.P., Harhay, D.M., Keele, J.W., Henniger, M.T., Clemmons, B.A., Smith, T.P.L. 2020. Classification of 16S rRNA reads is improved using a niche-specific database constructed by near-full length sequencing. PLoS One. 15(7):e0235498. https://doi.org/10.1371/journal.pone.0235498. DOI: https://doi.org/10.1371/journal.pone.0235498 Interpretive Summary: Surveys of microbial populations in environmental samples often utilize sequence variation in the bacterial gene encoding the ribosomal small subunit (the 16S rRNA gene). Generally, these surveys amplify portions of the 16S rRNA gene, sequence these amplified portions in bulk, and assign to taxonomic categories by comparison to sequence databases to connect the amplified portions of the 16S rRNA gene to specific bacterial organisms. Due to sequence length constraints of the most popular sequencing platforms, the selected amplified portions of the 16S rRNA gene only contain one to three of the nine variable regions of the gene, and taxonomic assignment is based on relatively short stretches of sequence (150-500 bases). We demonstrate that taxonomic assignment is improved by using a sequence database that is specific to the target environment, in this case using a niche of interest represented by the upper respiratory tract (URT) of cattle. We create a custom database from full length sequences (contain all nine variable regions) from the URT and then compared shorter sequences (contain 1-3 variable regions) from samples of the URT to the database for taxonomic classification. This process improves the ability to detect changes in the microbial populations of a given environment and the accuracy of defining higher taxonomic resolution. Technical Abstract: Surveys of microbial populations in environmental niches of interest often utilize sequence variation in the gene encoding the ribosomal small subunit (the 16S rRNA gene). Generally, these surveys target the 16S genes using semi-degenerate primers to amplify portions of a subset of bacterial species, sequence the amplicons in bulk, and assign to putative taxonomic categories by comparison to databases purporting to connect specific sequences in the main variable regions of the gene to specific organisms. Due to sequence length constraints of the most popular bulk sequencing platforms, the primers selected amplify one to three of the nine variable regions, and taxonomic assignment is based on relatively short stretches of sequence (150–500 bases). We demonstrate that taxonomic assignment is improved through reduced unassigned reads by including a survey of near-full-length sequences specific to the target environment, using a niche of interest represented by the upper respiratory tract (URT) of cattle. We created a custom Bovine URT database from these longer sequences for assignment of shorter, less expensive reads in comparisons of the upper respiratory tract among individual animals. This process improves the ability to detect changes in the microbial populations of a given environment, and the accuracy of defining the content of that environment at increasingly higher taxonomic resolution. |