Skip to main content
ARS Home » Research » Publications at this Location » Publication #167612

Title: DEVELOPMENT OF ALGORITHMS FOR PREDICTION AND VALIDATION OF POLYMORPHISMS IN POLYPLOIDS (SOYBEAN) USING EST DATA

Author
item MATUKUMALLI, LAKSHMI - GEORGE MASON UNIVERSITY
item GREFENSTETTE, JOHN - GEORGE MASON UNIVERSITY
item Van Tassell, Curtis - Curt
item CHOII, IK-YOUNG - 1275-31-00
item Cregan, Perry

Submitted to: International Genome Sequencing and Analysis Conference
Publication Type: Abstract Only
Publication Acceptance Date: 8/3/2004
Publication Date: 10/25/2004
Citation: Matukumalli, L.K., Grefenstette, J.J., Van Tassell, C.P., Choii, I., Cregan, P.B. 2004. Development of algorithms for prediction and validation of polymorphisms in polyploids (soybean) using EST data [abstract]. TIGR's Sixteenth International Genome Sequencing and Analysis Conference. p. 33.

Interpretive Summary:

Technical Abstract: Identification of polymorphisms in polyploid species with complex genomes is difficult, but they can help to characterize variations that confer disease resistance, improved quality, increased tolerance and increased productivity. Only Expressed sequence tags (EST) sequencing has been performed in these species to derive more information about the genes. In this study we used soybean as a model for studying polyploid genomes.Existing EST assemblies (e.g., TIGR gene index) were found to be not suitable for polymorphisms detection because they have not used sequence quality information and also they were built using a diploid gene model. We developed a gene model for computational analysis of polyploids that considered paralogs i.e., duplicate gene copies and alternate splice sites in analyzing EST data. We have developed two algorithmic methods for distinguishing the paralogs and predict polymorphisms. These predictions were experimentally tested using a high throughput software pipeline (SNP-PHAGE: SNP ' Pipeline for Haplotype analysis and GEnbank [dbSNP] submissions) that was developed as a part of this project. Approximately 6,000 expert verified polymorphisms were discovered using this pipeline. The bioinformatics tools developed in this project were generalized to be applicable to polyploid species like Wheat, Cotton, Canola, Corn, Potatoes, Alfalfa and are made available open source.