Skip to main content
ARS Home » Pacific West Area » Hilo, Hawaii » Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center » Tropical Crop and Commodity Protection Research » Research » Publications at this Location » Publication #384749

Research Project: Development of New and Improved Surveillance, Detection, Control, and Management Technologies for Fruit Flies and Invasive Pests of Tropical and Subtropical Crops

Location: Tropical Crop and Commodity Protection Research

Title: HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly

Author
item Sim, Sheina
item Corpuz, Renee
item SIMMONDS, TYLER - Oak Ridge Institute For Science And Education (ORISE)
item Geib, Scott

Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/8/2022
Publication Date: 2/22/2022
Citation: Sim, S.B., Corpuz, R.L., Simmonds, T.J., Geib, S.M. 2022. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. Biomed Central (BMC) Genomics. 23. Article 157. https://doi.org/10.1186/s12864-022-08375-1.
DOI: https://doi.org/10.1186/s12864-022-08375-1

Interpretive Summary: The Pacific Biosciences (PacBio) High-Fidelity (HiFi) read technology has revolutionized genomics and high-throughput sequencing. PacBio HiFi is currently the industry standard whole genome sequencing platform that has been adopted by genome sequencing and assembly projects and initiatives such as the Earth BioGenome Project, the Vertebrate Genome Project, and the 5000 insect genomes (i5K) initiative. Though read-adapter contamination filtering is a routine part of traditional short-read sequencing and analysis pipelines, it has not yet been a widely adapted practice for PacBio HiFi read workflows. Our analysis of 55 publicly available PacBio HiFi datasets in the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) revealed that a read-sanitation step for the purpose of removing PacBio Blunt End Adapter contaminated reads from the raw-read pool is necessary as the adapter sequences can erroneously be integrated into the final assembly. In this manuscript, we describe the nature of the adapter contaminated reads, their consequences in the final assembly, and a simple solution for removing the adapter contaminated reads prior to assembly to produce the optimal genome assembly.

Technical Abstract: PacBio HiFi read technology is currently the industry standard sequencing method that has been widely adopted by large sequencing and assembly initiatives. Though adapter contamination filtering is a routine part of traditional short-read analysis pipelines, it has not yet been a widely adapted practice for PacBio HiFi workflows. Analysis of 55 publicly available PacBio HiFi datasets revealed that a read-sanitation step for the purpose of removing PacBio Blunt End Adapter contaminated reads from the raw-read pool is necessary as adapter sequences can be erroneously integrated into final assemblies. Here we describe the nature of adapter contaminated reads, their consequences in the final assembly, and a simple solution for removing the adapter contaminated reads prior to assembly to produce the optimal assembly.