Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #390827

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

Title: CAPG: comprehensive allopolyploid genotyper

Author
item KULKARNI, ROSHAN - Oak Ridge Institute For Science And Education (ORISE)
item ZHANG, YUDI - Iowa State University
item Cannon, Steven
item DORMAN, KARIN - Iowa State University

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/10/2022
Publication Date: 11/11/2022
Citation: Kulkarni, R., Zhang, Y., Cannon, S.B., Dorman, K.S. 2022. CAPG: comprehensive allopolyploid genotyper. Bioinformatics. 39(1).Article btac729. https://doi.org/10.1093/bioinformatics/btac729.
DOI: https://doi.org/10.1093/bioinformatics/btac729

Interpretive Summary: Modern breeding approaches typically begin with identification of genetic variations that distinguish one variety from another. This identification process, called genotyping, involves comparing DNA sequences among different varieties or breeding materials, to identify distinguishing differences. This process is relatively straightforward in diploids (species with single sets of chromosomes in each cell), but is challenging in polyploids (species that have additional copies of each chromosome - often, a result of a merger of two moderately-related species). Many crop are polyploids; examples include peanut, wheat, cotton, quinoa, and strawberry. This manuscript describes a new software package, Comprehensive Allopolyploid Genotyper (CAPG), designed for genotyping in polyploid species. Results for peanut and cotton indicate improved accuracy over other available methods. These new software methods should assist in breeding work on numerous polyploid species, helping breeders to more efficiently and accurately identify genetic variations that can be used to select for improved crop varieties.

Technical Abstract: Genotyping is an essential step in investigating genetic variation in plants. Genotyping increasingly uses next-generation sequencing to identify variants among accessions being assessed. Sequencing-based genotyping in allopolyploids is particularly challenging, however, due to the difficulty of distinguishing allelic SNPs from the homoeologous SNPs. Because of the similarity between subgenomes in an allopoloyploid, reads are easily misassigned to the wrong subgenome, resulting in false heterozygous calls. Recently developed genotyping methods use prior information such as allelic frequencies, rate of heterozygosity, or parental genotype information of parents to achieve better read assignment. However, when such information is unavailable, existing genotyping methods fail to identify SNPs accurately. This manuscript introduce the Comprehensive Allopolyploid Genotyper (CAPG), which uses subgenomic reference sequences and formulates an explicit likelihood model to accurately assign reads to subgenomes and genotype individual allopolyploids from whole genome resequence (WGS) data. CAPG uses the likelihood of alignment to each subgenome to appropriately weight the information in each sequence and make a genotype call at allopolyploid level. We demonstrate the use of CAPG in allotetraploids. CAPG performs better than GATK’s HaplotypeCaller, a popular genotyping method, when applied to reads aligned to the combined subgenomic references. Source code is available at https://github.com/Kkulkarni1/CAPG.git