Author
GLAUBITZ, JEFFREY - Cornell University | |
CASSTEVENS, TERRY - Cornell University | |
LU, FEI - Cornell University | |
HARRIMAN, JAMES - Cornell University | |
ELSHIRE, ROBERT - Cornell University | |
SUN, QI - Cornell University | |
Buckler, Edward - Ed |
Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 1/28/2014 Publication Date: 2/28/2014 Citation: Glaubitz, J.C., Casstevens, T.M., Lu, F., Harriman, J., Elshire, R.J., Sun, Q., Buckler Iv, E.S. 2014. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One. 9(2):e90346. Interpretive Summary: Genotyping by sequencing (GBS) is a method that adapts modern sequencing technologies to efficiently genotype large numbers of individuals at hundreds of thousands of locations across the genome. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. In this paper, we describe new bioinformatic software, called TASSEL-GBS, designed for the efficient processing of raw GBS sequence data into genotypes. The TASSEL-GBS pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8-16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of DNA positions can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, where rapid turnover is required from tissue collection to genotypes. In this paper, we describe the TASSEL-GBS pipeline in detail and demonstrate it by performing a large scale, species wide analysis in maize (Zea mays). We estimated that the genotypes produced by this analysis had an average error rate of only 0.0042. Overall, the GBS assay and the TASSEL-GBS pipeline provide robust tools for studying genetic diversity. Technical Abstract: Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, TASSEL-GBS, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The TASSEL-GBS pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8–16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished “pseudo-reference” consisting of numerous contigs. We describe the TASSEL-GBS pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays), where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the TASSEL-GBS pipeline provide robust tools for studying genomic diversity. |