Skip to main content
ARS Home » Northeast Area » Orono, Maine » National Cold Water Marine Aquaculture Center » Research » Publications at this Location » Publication #396660

Research Project: Genetic Improvement of North American Atlantic Salmon and the Eastern Oyster for Aquaculture Production

Location: National Cold Water Marine Aquaculture Center

Title: Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data

Author
item Delomas, Thomas
item WILLIS, STUART - Columbia River Intertribal Fish Commission

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/30/2023
Publication Date: 11/3/2023
Citation: Delomas, T.A., Willis, S.C. 2023. Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data. Bioinformatics. https://doi.org/10.1186/s12859-023-05554-z.
DOI: https://doi.org/10.1186/s12859-023-05554-z

Interpretive Summary: Genetic applications for agricultural species and natural resource conservation often require obtaining genetic information from a large number of organisms. For example, an oyster breeding program may need to assess 10,000 oysters annually. To make these applications cost-effective, a small number of highly informative regions in the genome, called genetic markers, are targeted in a genetic panel. A newly recognized class of genetic markers called microhaplotypes has the potential to make these small panels more informative than they currently can be. However, the identification and screening of microhaplotypes from across the genome currently requires whole genome sequencing data from a large number of individuals, which is cost-prohibitive. To lower the costs of identifying and screening microhaplotypes, we developed statistical methods that use low-coverage whole genome sequencing and pooled sequencing data for this purpose. The new methods will allow microhaplotype panels to be designed using cost-effective low-coverage whole genome sequencing or pooled sequencing data.

Technical Abstract: Background Microhaplotypes have the potential to be more cost-effective than SNPs for applications that require genetic panels of highly variable loci. However, development of microhaplotype panels is hindered by a lack of methods for estimating microhaplotype allele frequency from low-coverage whole genome sequencing or pooled sequencing (pool-seq) data. Results We developed new methods for estimating microhaplotype allele frequency from low-coverage whole genome sequence and pool-seq data. We validated these methods using datasets from three non-model organisms. These methods allowed estimation of allele frequency and expected heterozygosity at depths routinely achieved from pooled sequencing. Conclusions These new methods will allow microhaplotype panels to be designed using low-coverage WGS and pool-seq data to discover and evaluate candidate loci. The python script implementing the two methods and documentation are available at https://www.github.com/delomast/mhFromLowDepSeq.