Skip to main content
ARS Home » Northeast Area » Leetown, West Virginia » Cool and Cold Water Aquaculture Research » Research » Publications at this Location » Publication #414780

Research Project: Integrated Research Approaches for Improving Production Efficiency in Rainbow Trout

Location: Cool and Cold Water Aquaculture Research

Title: Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout

Author
item Liu, Sixin
item MARTIN, KYLE - Troutlodge, Inc
item Snelling, Warren
item Long, Roseanna
item Leeds, Timothy - Tim
item Vallejo, Roger
item Wiens, Gregory - Greg
item Palti, Yniv

Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/5/2024
Publication Date: 7/23/2024
Citation: Liu, S., Martin, K.E., Snelling, W.M., Long, R., Leeds, T.D., Vallejo, R.L., Wiens, G.D., Palti, Y. 2024. Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout. G3, Genes/Genomes/Genetics. https://doi.org/10.1093/g3journal/jkae168.
DOI: https://doi.org/10.1093/g3journal/jkae168

Interpretive Summary: High-throughput genotyping methods are essential for modern genetics and breeding in rainbow trout. Although the trout 57K genotyping array has been widely used for genotyping in rainbow trout, this array covers only a small fraction of genetic variants in rainbow trout genome. With the rapid and significant cost reduction of DNA sequencing, low-coverage whole genome sequencing has emerged as a cost-effective alternative to array genotyping. In this study, we identified millions of genetic variants based on high-coverage whole-genome sequencing of 410 fish from five rainbow trout breeding populations. Using the genotypes based on high-coverage sequencing as a reference panel, we were able to predict genotypes from low-coverage whole-genome sequencing data. Compared to array-based genotypes, this new genotyping method increased the number of genotypes by more than two orders of magnitude, and the genotypes were highly accurate. Thus, we developed an attractive alternative high-throughput genotyping method in rainbow trout.

Technical Abstract: With the rapid and significant cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS) followed by genotype imputation is becoming a cost-effective alternative to SNP (single nucleotide polymorphism) array genotyping. The objectives of this study were two-fold: 1) construct a haplotype reference panel for genotype imputation from lcWGS data in rainbow trout (Oncorhynchus mykiss); and 2) evaluate the concordance between imputed genotypes and SNP-array genotypes in two breeding populations. To develop a haplotype reference panel for genotype imputation, high-coverage (12x) whole-genome sequences were obtained from a total of 410 fish representing five breeding populations that spawn in February (n=2), May, August, and November. The sequence reads were mapped to the rainbow trout reference genome, and genetic variants were identified using GATK. After data filtering, 20,434,612 biallelic SNPs were retained. Based on principal component analysis, the 410 fish were clustered into five groups consistent with their spawning dates and year-classes. The reference panel was phased with SHAPEIT5, and was used as a reference to impute genotypes from lcWGS data using GLIMPSE2. A total of 90 fish from the Troutlodge November breeding population were sequenced with an average coverage 1.3x, and these fish were also genotyped with the Axiom 57K trout SNP array. The concordance between array-based genotypes and imputed genotypes was 99.1%. To evaluate the imputation accuracy at lower coverages, we downsampled the coverage to 0.5x, 0.2x and 0.1x, and the concordance between array-based genotypes and imputed genotypes was 98.7%, 97.8% and 96.7%, respectively. To further evaluate the accuracy of genotype imputation, 109 fish from the breeding program at the National Center for Cool and Cold Water Aquaculture were sequenced with an average coverage 0.9x and they were also genotyped with the 57K SNP array. Genotypes were imputed after downsampling the coverage to 0.5x. The concordance between array-based genotypes and imputed genotypes was 97.8%. In conclusion, the reference haplotype panel reported in this study can be used to accurately impute genotypes from lcWGS data in rainbow trout breeding populations.