Skip to main content
ARS Home » Plains Area » Manhattan, Kansas » Center for Grain and Animal Health Research » Hard Winter Wheat Genetics Research » Research » Publications at this Location » Publication #406684

Research Project: Genetic Improvement of Biotic and Abiotic Stress Tolerance and Nutritional Quality in Hard Winter Wheat

Location: Hard Winter Wheat Genetics Research

Title: Haplocatcher: A package for prediction of haplotypes

Author
item WINN, ZACHARY - Colorado State University
item HUDSON-ARNS, EMILY - Colorado State University
item HAMMERS, MIKAYLA - Colorado State University
item DEWITT, NOAH - Louisiana State University
item LYERLY, JEANETTE - North Carolina State University
item Bai, Guihua
item St Amand, Paul
item HALEY, SCOTT - Colorado State University
item MASON, RICHARD - Colorado State University

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/15/2023
Publication Date: 7/24/2023
Citation: Winn, Z.J., Hudson-Arns, E., Hammers, M., Dewitt, N., Lyerly, J., Bai, G., St Amand, P.C., Haley, S., Mason, R.E. 2023. Haplocatcher: A package for prediction of haplotypes. The Plant Genome. https://doi.org/10.1101/2023.07.20.549744.
DOI: https://doi.org/10.1101/2023.07.20.549744

Interpretive Summary: Breeders use molecular markers to identify lines possessing beneficial haplotypes. Breeding programs may leverage genome-wide PCR-based marker information to make inferences about target haplotypes in newly sequenced lines. In this study, we developed "HaploCatcher", an R package, to predict haplotypes of interest in the lines genotyped using genome-wide markers. The package was used to predict the Sst1 haplotypes of 292 new breeding lines with high accuracy based on the data from 1,056 wheat breeding lines that have genome-wide markers and the Sst1 marker. The package is freely available and can be utilized to predict haplotypes in whole-genome sequenced early generation materials.

Technical Abstract: Wheat (Triticum aestivum L.) is crucial to global food security, but is often threatened by diseases, pests, and environmental stresses. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes. Breeding programs have heavily invested in genome-wide genotyping platforms that produce high-volume, non-targeted molecular information. Early-stage lines for which non-targeted genotypes are available are not characterized for beneficial haplotypes. This implies that breeding programs may leverage genome-wide polymerase chain reaction (PCR)-based marker information to make inferences about haplotypes in newly sequenced lines. In this study, an R package titled "HaploCatcher" was developed to predict specific haplotypes of interest in the lines genotyped using genome-wide markers. A training population of 1,056 lines genotyped for the Sst1 locus and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes made with the training population were compared to marker derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed Colorado State lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy to that estimated in cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling were not significantly different. The HaploCatcher package is freely available and may be utilized by breeding programs to predict haplotypes in whole-genome sequenced early generation materials.