Location: Agricultural Genetic Resources Preservation Research
Title: A pan-genome data structure induced by pooled sequencing facilitates variant mining in heterogeneous germplasmAuthor
Submitted to: Molecular Breeding
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 6/8/2022 Publication Date: 6/25/2022 Citation: Reeves, P.A., Richards, C.M. 2022. A pan-genome data structure induced by pooled sequencing facilitates variant mining in heterogeneous germplasm. Molecular Breeding. 42. Article e36. https://doi.org/10.1007/s11032-022-01308-6. DOI: https://doi.org/10.1007/s11032-022-01308-6 Interpretive Summary: Gene banks often contain samples that are mixtures of genotypes. This intra-accession variation becomes an impediment to characterizing genetic resources because characterization data cannot adequately account for variation at the accession level. Here we describe an alternative to purifying lines through single seed descent before genotyping with the application and validation of a pooled sequencing approach. This approach aims to overcome this limitation in diversity characterization by providing a user with information about the full range of genetic variation within a diverse sample. We demonstrate that pooled samples sequenced together and assembled into a composite reference genome (pan genome) can be used to develop a mapped of catalog of short haplotypes that can be queried using standard tools to identify accessions containing variation at agronomically important loci. Technical Abstract: Valuable genetic variation lies unused in gene banks due to the difficulty of exploiting heterogeneous germplasm accessions. Advances in molecular breeding, including transgenics and genome editing, present the opportunity to exploit hidden sequence variation directly. Here we describe the pan genome data structure induced by whole genome sequencing of pooled individuals from wild populations of Patellifolia spp., a source of disease resistance genes for the related crop species sugar beet (Beta vulgaris). We represent the pan genome of a heterogeneous population sample as a set of phased reads mapped to a reference genome assembly derived from the sequence pool or other data. We show that this basic data structure can be queried by homology to identify short haplotypic variants present in the wild relative, at genes of agronomic interest in the crop. Further we demonstrate the possibility of cataloging short haplotypic variation in all Patellifolia genomic regions with corresponding single copy orthologous regions in sugar beet. The data structure, termed a "phased read archive", can be produced, altered, and queried using standard tools to facilitate discovery of agronomically important sequence variation in heterogeneous germplasm. |