Location: National Cold Water Marine Aquaculture Center
Title: Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing dataAuthor
Delomas, Thomas | |
WILLIS, STUART - Columbia River Intertribal Fish Commission |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/30/2023 Publication Date: 11/3/2023 Citation: Delomas, T.A., Willis, S.C. 2023. Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data. Bioinformatics. https://doi.org/10.1186/s12859-023-05554-z. DOI: https://doi.org/10.1186/s12859-023-05554-z Interpretive Summary: Genetic applications for agricultural species and natural resource conservation often require obtaining genetic information from a large number of organisms. For example, an oyster breeding program may need to assess 10,000 oysters annually. To make these applications cost-effective, a small number of highly informative regions in the genome, called genetic markers, are targeted in a genetic panel. A newly recognized class of genetic markers called microhaplotypes has the potential to make these small panels more informative than they currently can be. However, the identification and screening of microhaplotypes from across the genome currently requires whole genome sequencing data from a large number of individuals, which is cost-prohibitive. To lower the costs of identifying and screening microhaplotypes, we developed statistical methods that use low-coverage whole genome sequencing and pooled sequencing data for this purpose. The new methods will allow microhaplotype panels to be designed using cost-effective low-coverage whole genome sequencing or pooled sequencing data. Technical Abstract: Background Microhaplotypes have the potential to be more cost-effective than SNPs for applications that require genetic panels of highly variable loci. However, development of microhaplotype panels is hindered by a lack of methods for estimating microhaplotype allele frequency from low-coverage whole genome sequencing or pooled sequencing (pool-seq) data. Results We developed new methods for estimating microhaplotype allele frequency from low-coverage whole genome sequence and pool-seq data. We validated these methods using datasets from three non-model organisms. These methods allowed estimation of allele frequency and expected heterozygosity at depths routinely achieved from pooled sequencing. Conclusions These new methods will allow microhaplotype panels to be designed using low-coverage WGS and pool-seq data to discover and evaluate candidate loci. The python script implementing the two methods and documentation are available at https://www.github.com/delomast/mhFromLowDepSeq. |