Skip to main content
ARS Home » Midwest Area » West Lafayette, Indiana » Crop Production and Pest Control Research » Research » Publications at this Location » Publication #373595

Research Project: Molecular Mechanisms of Host-Fungal Pathogen Interactions in Cereal Crops

Location: Crop Production and Pest Control Research

Title: Holistic genotyping of Amplicon panels with SPAdes and Blastn

Author
item Crane, Charles
item Crane, Yan Ma

Submitted to: Plant and Animal Genome Conference Proceedings
Publication Type: Abstract Only
Publication Acceptance Date: 11/4/2019
Publication Date: 1/12/2020
Citation: Crane, C.F., Crane, Y.M. 2020. Holistic genotyping of Amplicon panels with SPAdes and Blastn. Plant and Animal Genome XXVIII Conference Proceedings. January 11-15, 2020, San Diego, CA.

Interpretive Summary:

Technical Abstract: Although current-generation sequencing has allowed the recognition of myriad SNPs and small indels for use in genetic mapping and association analysis, few studies have investigated multi-SNP markers. Yet such markers might be more accurate and less prone to missing data than single SNPs, depending on the frequency of base-substitution, base-skipping, base-insertion, and run-miscount errors in reads. Failure to detect a particular amplicon allele would affect single-SNP and multiSNP markers equally. MultiSNP markers can exist as many more alleles than single SNPs, allowing all copies to be detected as distinct alleles in polyploids and population samples and greatly simplifying the estimation of allelic dosage and phasing. We propose and test a simple protocol to identify and detect multiSNP genotypes in amplicon panels, where the primer sequence is known but the intervening sequence is not known a priori. We stringently assemble the sequenced, quality-filtered amplicons with SPAdes to generate contigs, which represent all the alleles plus many combinations of sequencing errors. We then align each read to the contigs with blastn and assign the read to the highest-bitscoring contig. We count the number of times each contig is hit and call a genotype from the most frequently hit contigs on the basis of the ploidy and the ratios of counts to one another. We will present simulation results from a diploid, a triploid, and a 14-ploid that represents the alleles in a mapping population derived from heterozygous hexaploid x heterozygous octoploid parents. The simulations will show the response of the called genotypes to sequencing errors and depth of read coverage. We will finish with a real example involving 52 amplified loci in 384 individuals in eight strains of Hessian fly (Mayetiola destructor).