Skip to main content
ARS Home » Northeast Area » Geneva, New York » Plant Genetic Resources Unit (PGRU) » Research » Publications at this Location » Publication #377556

Research Project: Conservation and Utilization of Priority Vegetable Crop Genetic Resources and Associated Information

Location: Plant Genetic Resources Unit (PGRU)

Title: Onion (Allium cepa) psuedoreference genome

Author
item Labate, Joanne
item GLAUBITZ, JEFFREY - Cornell University
item HAVEY, MICHAEL - US Department Of Agriculture (USDA)

Submitted to: Dryad Digital Repository
Publication Type: Other
Publication Acceptance Date: 8/13/2020
Publication Date: 8/13/2020
Citation: Labate, J.A., Glaubitz, J., Havey, M. 2020. Onion (Allium cepa) psuedoreference genome. Dryad Digital Repository. https://doi.org/10.5061/dryad.6wwpzgmwg.
DOI: https://doi.org/10.5061/dryad.6wwpzgmwg

Interpretive Summary: Genome sequences of vegetable crops are valuable tools to breeders and other researchers. These sequences are useful to discover genes that can be used to develop new crop traits to improve ease of growing, yield, and quality of the consumed product. Onion is a widely grown and economically valuable vegetable crop with a range of culinary uses for both processed and fresh forms. The complete genome sequence of onion has not been published because it is very large and complex, and contains a low density of genes within enormous stretches of DNA with unknown function. We performed DNA sequencing in 46 onion plants that were descended from a cross of one original pair of parents. Their close relationships to each other, and the small differences between their genomes allowed us to use computer tools to develop a high quality, partial genome sequence of onion. This sequence has been made public and will be useful for developing new onion varieties based on gene markers.

Technical Abstract: 46 F2 plants and parents of the onion (Allium cepa) mapping population Brigham Yellow Globe 15-23 x Ailsa Craig 43 were genotyped, as well as two doubled haploid (DH) onion lines DH2107 and DH2110 which were used as completely homozygous controls. Genotyping by sequencing (GBS) was performed using an Illumina HiSeq 2000 on two to four replicates of every DNA sample. GBS libraries were prepared at Cornell University’s Genomic Diversity Facility using the restriction enzyme EcoT22I and assayed in 96-plex format using standard protocols. SNP calling on the 46 F2 plants, two parents, and the two DH lines was performed using TASSEL 3.0 Universal Network Enabled Analysis (UNEAK) bioinformatics pipeline, which does not require a reference genome. Over 70,000 raw SNPs were scored in these samples. Quality filters were then applied to SNPs as follows: not heterozygous in either DH line, minor allele frequency greater than or equal to 30%, minimum genotypic read depth of seven, maximum missing data of 10%, and conforming to the expected 1:2:1 segregation ratio (goodness-of-fit > 0.01) within the F2 family. For the resulting 752 SNPs, MSTMap software tool was used to construct a genetic linkage map using a grouping LOD criteria of p < 1 x 10-7. This gave 701 SNPs in 15 linkage groups (LG) with = 15 markers each (the remaining 51 markers were not placed on a linkage group). The number of SNPs per LG ranged from 15 – 90, and the estimated size of LGs ranged from 52 to 327 cM. Because UNEAK treats redundant, reverse complement tags from opposite strands as separate markers, 171 redundant tag pairs were eliminated from this linkage map. A pseudo-reference genome was constructed consisting of one tag from each of the 530 non-redundant, mapped tag pairs concatenated together into a single pseudo-molecule. To prevent spurious alignment across two distinct pseudo-reference tags, each tag in the pseudo-reference was separated by a span of at least 32 A nucleotides. The purpose of the pseudo-reference was to allow discovery of additional SNPs within each tag pair locus in 94 diverse onion accessions that were not segregating in the mapping population, thereby reducing the ascertainment bias that would result from using only SNPs discovered in only one F2 family in a population survey.