Skip to main content
ARS Home » Pacific West Area » Logan, Utah » Forage and Range Research » Research » Publications at this Location » Publication #317435

Title: An integrated approach to exploit linkage disequilibrium for ultra high dimensional genome-wide data

Author
item CARLSEN, MICHELLE - Utah State University
item FU, GUIFANG - Utah State University
item Bushman, Shaun
item CORCORAN, CHRIS - Utah State University

Submitted to: PLoS Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/19/2015
Publication Date: 12/9/2015
Citation: Carlsen, M., Fu, G., Bushman, B.S., Corcoran, C. 2015. An integrated approach to exploit linkage disequilibrium for ultra high dimensional genome-wide data. PLoS Genetics. 202:411-426.

Interpretive Summary: With the advent of recent DNA sequencing methods (determining molecule order) that quickly produce millions of DNA sequences, variation among sequences in a genome (all the DNA contained in chromosomes of an organism) can be tested for association with traits of economic interest on a relatively large scale. This type of association between a plant trait and DNA sequence points (or single nucleotide polymorphisms; SNPs) made on a large scale is called a genome-wide association analysis. However, searching for just a few SNPs that cause a trait to vary (e.g., disease resistance), among hundreds of thousands of other SNPs, often leads to false identificaiton of SNPs by statistical modeling. To address this issue, an improved statistical approach (DCRR) was developed to find the true associations between a large number of SNPs and a trait of interest. The DCRR was applied to 48 different computer simulations as well as a real plant data set. In each case, the DCRR approach was better at finding the SNPs that were truly associated with the traits, and removing the false SNPs that were not associated with the trait, when compared to other popular statistical models. The results provide scientists with a method to improve the accuracy and efficiency of large genome-wide association studies, which could reduce the time needed to provide the public with improved plant cultivars.

Technical Abstract: With the advent of recent DNA sequencing methods (determining molecule order) that quickly produce millions of DNA sequences, variation among sequences in a genome (all the DNA contained in chromosomes of an organism) can be tested for association with traits of economic interest on a relatively large scale. This type of association between a plant trait and DNA sequence points (or single nucleotide polymorphisms; SNPs) made on a large scale is called a genome-wide association analysis. However, searching for just a few SNPs that cause a trait to vary (e.g., disease resistance), among hundreds of thousands of other SNPs, often leads to false identificaiton of SNPs by statistical modeling. To address this issue, an improved statistical approach (DCRR) was developed to find the true associations between a large number of SNPs and a trait of interest. The DCRR was applied to 48 different computer simulations as well as a real plant data set. In each case,the DCRR approach was better at finding the SNPs that were truly associated with the traits, and removing the false SNPs that were not associated with the trait, when compared to other popular statistical models. The results provide scientists with a method to improve the accuracy and efficiency of large genome-wide association studies, which could reduce the time needed to provide the public with improved plant cultivars.