Skip to main content
ARS Home » Research » Publications at this Location » Publication #166407

Title: IDENTIFICATION OF ERRORS IN COTTON FIBER DATA SETS USING BAYESIAN NETWORKS

Author
item Sassenrath, Gretchen
item BOGGESS, J. - MISS. STATE UNIVERSITY
item BI, XINTONG - MISS. STATE UNIVERSITY
item PRINGLE, H. - DELTA RES. & EXT. CTR.

Submitted to: Applied Statistics In Agriculture Conference Proceedings
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/11/2005
Publication Date: 6/15/2005
Citation: Sassenrath, G.F., Boggess, J.E., Bi, X., Pringle, H.C. 2005. Identification of errors in cotton fiber data sets using bayesian networks. Applied Statistics In Agriculture Conference Proceedings. CD-ROM, pg 287-295.

Interpretive Summary: Two areas of research have simultaneously realized the impact of harvesting and ginning methods on the final quality of cotton lint: precision agriculture and breeding. Because both of these areas of study are directed at improving profitability to cotton producers, research results need to be comparable to those that a producer would find. In the research setting, common practices of cotton harvesting and ginning are different from those used by producers. Researchers routinely hand-harvest cotton and then gin the cotton in small research gins. The research gins are equipped with saws that remove the lint from the seed, and then pass the lint through rollers to the collection area. No preprocessing of the seed cotton is performed. Alternatively, producers harvest cotton by machine, and gin the cotton in large production gins. These large gins have ovens for drying the cotton, and leaf and stick machines for removing leaf fragments and other trash from the cotton. After this initial processing, the cotton is ginned to separate the lint and seed, and packaged for shipping. Current methods of determining the value of the cotton fiber quality and price make it difficult to translate the research results to the producer results through standard statistical methods. This study was undertaken to develop a method of translating these small-scale researcher level results to full-scale production level results. The research reported here is the first stage in that study. This study demonstrates the use of Bayesian networks to detect erroneous entries in cotton fiber data sets.

Technical Abstract: Cotton fiber is graded on a series of parameters based on physiological factors (strength, length, and thickness), lint color, and presence of non-lint matter such as leaves, stems or other foreign materials. Cotton lint is graded by the USDA-AMS after harvest and ginning, and the grade determines the price of the lint. Given the importance of cotton fiber quality to the value of the crop, the spatial variability of cotton fiber properties is of particular interest to researchers and producers in developing management scenarios for optimal profitability. Previous research studies have relied on hand-harvesting the cotton at intervals throughout the field to obtain a measure of the cotton fiber quality and the extent of spatial variability. However, hand-harvested cotton has different qualities than that harvested by machine and ginned in the large-scale production gins. Part of this arises from the difference in efficiency of harvest between machine and humans, and part results from the different gins used for the smaller sample sizes. While these studies have demonstrated the extent of spatial variability of fiber properties, hand-harvesting is not amenable to large-scale or production research efforts. Moreover, the differences in fiber properties limit the extension of the results to the production setting. We have developed a mechanism of sampling cotton from the cotton chute during mechanical harvest. The samples are then ginned in a research gin. This study was undertaken to develop a method of translating these small-scale researcher level results to full-scale production level results. The research reported here is the first step in that effort, and demonstrates the use of Bayesian networks to detect erroneous entries in cotton fiber data sets.