Location: Cool and Cold Water Aquaculture Research
Title: Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencingAuthor
GOLDSTEIN, SARAH - University Of Connecticut | |
BEKA, LIDIA - University Of Connecticut | |
GRAF, JOERG - University Of Connecticut | |
KLASSEN, JONATHAN - University Of Connecticut |
Submitted to: BMC Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 12/16/2018 Publication Date: 1/9/2019 Citation: Goldstein, S., Beka, L., Graf, J., Klassen, J. 2019. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Bioinformatics. 20(23):1-17. https://doi.org/10.1186/s12864-018-5381-7. DOI: https://doi.org/10.1186/s12864-018-5381-7 Interpretive Summary: As the genome encodes all of the proteins made by bacteria, obtaining accurate genome sequences is key for identifying virulence factors, separating pathogenic from benign bacteria, and identifying optimal vaccine candidates. A major challenge is that genomes with extreme %GC contents are difficult to assemble well using the commonly used Illumina short-read technology. Important fish pathogens such as Flavobacterium columnare and F. psychrophilum have a very low %GC and are notoriously difficult to assemble. In this manuscript we assessed different bioinformatic pipelines that used sequences obtained using the Oxford Nanopore MinION, Illumina MiSeq or both. We found that first assemblying the genome using the MinION reads and then using MiSeq reads to correct sequences errors was the best approach for the majority of genomes we assembled. As part of this study, we were able to reduce the number of fragments in the bacterial genome assemblies from well over 50 down to 1 closed, circular genome. The approach used in this study will greatly improve the ability to sequence the genomes of fish pathogens. Technical Abstract: Background: Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated. Results: We sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these were more error-prone and required further polishing. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions. Conclusions: Genome assembly using short-reads is challenged by repetitive sequences and those with extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes, which enabled the accurate annotation of important but difficult to sequence genomic features such as insertion elements and secondary metabolite biosynthetic gene clusters. The combination of MinION and Illumina sequencing is cost effective and dramatically advances studies of microbial evolution and genome-driven drug discovery. |