Publication : USDA ARS

ARS Home » Research » Publications at this Location » Publication #131510

Title: DATA MINING FOR SIMPLE SEQUENCE REPEATS IN EXPRESSED SEQUENCE TAGS FROM BARLEY, MAIZE, SORGHUM AND WHEAT

Author

	KANTETY, RAMESH
	LAROTA, MAURICIO
	Matthews, David
	SORRELLS, MARK

Submitted to: Plant Molecular Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/5/2002
Publication Date: 3/1/2002
Citation: N/A

Interpretive Summary: Publicly available sequences derived from genes of barley, maize, rice, sorghum and wheat were searched to identify SSR (simple sequence repeat) regions. Sequences containing these regions were identified to allow creating molecular markers that should be both highly polymorphic within a species and cross-homologous among these important grass genera.

Technical Abstract: Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260,000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. We found that about 10% of the EST sequences belong to different parts of the mRNA transcripts that contain SSR motifs. The SSR-ESTs were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensi and singleton sequences from each species were pooled and clustered dto identify cross-species matches. Overall a reduction in the redundancy by 84% was observed when the resulting consensi and singleton sequences (3.569) wre compared to the total number of EST sequences analyzed (24,606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics.