Skip to main content
ARS Home » Plains Area » Manhattan, Kansas » Center for Grain and Animal Health Research » Hard Winter Wheat Genetics Research » Research » Publications at this Location » Publication #357325

Research Project: Genetic Improvement of Biotic and Abiotic Stress Tolerance and Nutritional Quality in Hard Winter Wheat

Location: Hard Winter Wheat Genetics Research

Title: Imputation accuracy of wheat GBS data using barley and wheat genome references

Author
item HADI, ALIPOUR - Kansas State University
item Bai, Guihua
item ZHANG, GUORONG - Kansas State University
item MOHAMMAD, BIHAMTA - Kansas State University
item MOHAMMADI, VALIOLLAH - Kansas State University
item PEYGHAMBARI, SEYED - Kansas State University

Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/20/2018
Publication Date: 1/7/2019
Citation: Hadi, A., Bai, G., Zhang, G., Mohammad, B.R., Mohammadi, V., Peyghambari, S.A. 2019. Imputation accuracy of wheat GBS data using barley and wheat genome references. PLoS One. 14(1):e0208614. https://doi.org/10.1371/journal.pone.0208614.
DOI: https://doi.org/10.1371/journal.pone.0208614

Interpretive Summary: Genotyping-by-sequencing (GBS) is a new next-generation-sequencing-based genotyping technology. It has been used widely for wheat genetic and breeding applications. Although GBS can discover millions of genetic markers, a high rate of missing data is a major concern for many applications. Imputation using reference genome sequences can fill in most of the missing data. We tested four reference genomes to impute missing GBS data and compared their imputation accuracies and found that the IWGSC RefSeq v1.0 reference provided the most imputed markers and the W7984 assembly provided the best imputation accuracy among the four references. Therefore, both reference genome sequences are effective for imputing missing GBS data for breeding applications.

Technical Abstract: Genotyping-by-sequencing (GBS) provides high SNP coverage and has recently emerged as a popular technology for genetic and breeding applications in bread wheat (Triticum aestivum L.) and many other plant species. Although GBS can discover millions of SNPs, a high rate of missing data is a major concern for many applications. Accurate imputation of those missing data can significantly improve the utility of GBS data. This study compared imputation accuracies among four genome references including three wheat references (Chinese Spring survey sequence, W7984, and IWGSC RefSeq v1.0) and one barley reference genome by comparing imputed data derived from low-depth sequencing to actual data from high-depth sequencing. After imputation, the average number of imputed data points was the highest in the B genome (~48.99%). The D genome had the lowest imputed data points (~15.02%) but the highest imputation accuracy. Among the four reference genomes, IWGSC RefSeq v1.0 reference provided the most imputed data points, but the lowest imputation accuracy for the SNPs with < 10% minor allele frequency (MAF). The W7984 reference, however, provided the highest imputation accuracy for the SNPs with < 10% MAF.