Page Banner

United States Department of Agriculture

Agricultural Research Service

Research Project: REDESIGNING FORAGE GERMPLASM AND PRODUCTION SYSTEMS FOR EFFICIENCY, PROFIT, AND SUSTAINABILITY OF DAIRY FARMS Title: Simple regression models as a threshold for selecting AFLP loci with reduced error rates

item Price, David -
item Casler, Michael

Submitted to: BMC Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: October 10, 2012
Publication Date: October 16, 2012
Repository URL:
Citation: Price, D., Casler, M.D. 2012. Simple regression models as a threshold for selecting AFLP loci with reduced error rates. BMC Bioinformatics. 13(1):268.

Interpretive Summary: The use of DNA markers to describe evolutionary and taxonomic relationships between plants has become commonplace. We routinely use DNA markers in many of our genetic studies of both cool-season and warm-season grasses. Some DNA markers are subject to errors in sequencing or amplification, possibly leading to misleading or incorrect results and conclusions. This paper describes a simple method to screen dominant DNA markers for unexpectedly high error rates to eliminate those markers from further analyses. Analysis of data from big bluestem showed that we reduced the average error rate from 13 to 6%. These results should be of value to anyone who uses dominant markers that are subject to significant error rates.

Technical Abstract: Amplified fragment length polymorphism is a popular DNA marker technique that has applications in multiple fields of study. Technological improvements and decreasing costs have dramatically increased the number of markers that can be generated in an amplified fragment length polymorphism experiment. As datasets increase in size, the number of genotyping errors also increases. Error within a DNA marker dataset can result in reduced statistical power, incorrect conclusions, and decreased reproducibility. It is essential that error within a dataset be recognized and reduced where possible. Using simple linear regression, a second-degree polynomial model was fit to describe the relationship between locus specific error rate and the frequency of present alleles. This model was then used to set a moving error rate threshold that varied based on the frequency of present alleles at a given locus. Loci with error rates greater than the threshold were removed from further analyses. This method of selecting loci is advantageous, as it accounts for differences in error rate between loci of varying frequencies of present alleles. An example using this method to select loci is demonstrated in an amplified fragment length polymorphism data set generated from the North American prairie species big bluestem. Within this data set, the error rate was reduced from 12.5% to 8.8% by removal of loci with error rates greater than the defined threshold. By repeating the method on selected loci, the error rate was further reduced to 5.9%. This reduction in error resulted in a substantial increase in the amount of genetic variation attributable to regional and population variation. This paper demonstrates a logical and computationally simple method for selecting loci with a reduced error rate. In the context of a genetic diversity study, this method resulted in an increased ability to detect differences between populations. Further application of this loci selection method, in addition to error reducing methodological precautions, will result in amplified fragment length polymorphism data sets with reduced error rates. This reduction in error rate should result in greater power to detect differences and increased reproducibility.

Last Modified: 4/17/2014
Footer Content Back to Top of Page