AN INTEGRATED DATABASE AND BIOINFORMATICS RESOURCE FOR SMALL GRAINS
Location: Genomics and Gene Discovery
Title: Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
| You, Frank - |
| Huo, Naxin - |
| Deal, Karin - |
| Luo, Ming-Cheng - |
| Mcguire, Pat - |
| Dvorak, Jan - |
Submitted to: Biomed Central (BMC) Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: January 25, 2011
Publication Date: January 25, 2011
Citation: You, F., Huo, N., Deal, K., Gu, Y.Q., Luo, M., Mcguire, P., Dvorak, J., Anderson, O.D. 2011. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. Biomed Central (BMC) Genomics. 12:59.
The newest generation of DNA sequencing machines provide much larger quantities of DNA sequence data at much lower costs. We here show that one use of this new technology is the ability to identify hundreds of thousands of markers distributed throughout a plant genome. In this study, approximately 500,000 markers were identified in the D genome of bread wheat. This ability to generate large scale marker resources will aid both further development of genomic technologies for large plant genomes such as wheat, and will assist in making such resources and advances available to wheat breeders.
An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome.