Skip to main content
ARS Home » Research » Publications at this Location » Publication #131580

Title: DATA MINING APPROACHES FOR IDENTIFICATION OF ELITE GENOTYPES IN GERMPLASM COLLECTIONS OF RICE USING MOLECULAR MARKER INFORMATION

Author
item CAPDEVIELLE, FABIAN - LSU
item Pinson, Shannon
item OARD, JAMES - LSU

Submitted to: Rice Technical Working Group Meeting Proceedings
Publication Type: Proceedings
Publication Acceptance Date: 2/8/2002
Publication Date: 6/1/2002
Citation: N/A

Interpretive Summary:

Technical Abstract: Modern information technology based on powerful computer-based systems is providing new tools to collect, transfer, store and combine agronomic and molecular data from breeding lines and germplasm collections. As a consequence, data mining approaches, based on clustering, classification and association analysis, could be applied to discover useful patterns in their data. This project evaluated a stewpise discriminant analysis (DA) procedure for its ability to use molecular markers to allocate rice lines into predetermined groups. DA was applied to classify rice lines into germplasm (indica, japonica-temperate, and japonica-tropical) and phenotypic classes (high versus low cold tolerance, seedling vigor, plant height and days to heading) as previously defined by Dr. Dave Mackill. Five to ten markers selected by DA were found to classify rice into germplasm classes with 90 percent to 98 percent accuracy. Phenotypic classes amongst the diverse germplasm could be correctly predicted 80 to 90 percent of the time. DA-selected markers were compared with QTLs by applying DA and interval analysis to the same sheath blight resistance data set collected from Lemont/Teqing gene-mapping population. When DA was based on extreme tails of the phenotypic distribution (3 SD between R and S), 10 markers could provide 100 percent correct classification of RILs. Accuracy dropped to 80 percent when markers were selected using less extreme progeny (1 SD between R and S). There was notable agreement between the DA-selected markers and resistance QTLs which suggests DA may be useful for identifying chromosomal regions containing genes. There was also disagreement between DA-selected markers and QTLs which we are clarifying with further investigation.