Skip to main content
ARS Home » Plains Area » Manhattan, Kansas » Center for Grain and Animal Health Research » Stored Product Insect and Engineering Research » Research » Publications at this Location » Publication #402249

Research Project: Next-Generation Approaches for Monitoring and Management of Stored Product Insects

Location: Stored Product Insect and Engineering Research

Title: uafR: An open-source R package that automates mass spectrometry data processing

Author
item STRATTON, CHASE - The Land Institute
item HANSEN, PHILLIPP - The Land Institute
item THOMPSON, YVONNE - The Land Institute
item SIO, KONILO - Agrocampus Ouest
item Morrison, William - Rob
item MURRELL, EBONY - The Land Institute

Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/12/2024
Publication Date: 7/5/2024
Citation: Stratton, C.A., Hansen, P., Thompson, Y., Sio, K., Morrison III, W.R., Murrell, E.G. 2024. uafR: An open-source R package that automates mass spectrometry data processing. PLOS ONE. 19(7): e0306202. https://doi.org/10.1371/journal.pone.0306202.
DOI: https://doi.org/10.1371/journal.pone.0306202

Interpretive Summary: We live in a world of odors and chemicals, and this is especially important for our crops, their pests, and understanding their interactions. Increasingly, researchers are able to generate extremely large datasets consisting of metabolomes (all the metabolic compounds in an organism) and volatilomes (all the odors produced by an organism). While machinery has become increasingly sophisticated to collect this data, our ability to process and interpret this data has severely lagged, hindering innovation and breakthroughs. In this interdisciplinary collaboration between plant scientists, entomologists, and data scientists, we have developed an R statistics package, called uafR, that automates the grueling manual annotation process for gas/liquid-chromatography coupled mass spectrometry (GC/LC-MS) data and allows anyone interested in chemical comparisons to quickly perform advanced analyses. Our streamlined methods for data processing and advanced analytical pipelines grant anyone with minimal computing experience access to an exhaustive source of chemical information. Interpretations can now be done at a fraction of the time, cost, and effort it would typically take using a standard chemical ecology data analysis pipeline. In a a dataset of purified standards, we showed our algorithms correctly identified the known compounds with high correlation to manual annotations of the data. In a large, previously published dataset, we found the number and types of compounds identified were comparable (or identical) to those identified with the traditional manual peak annotation process. Both the speed and accuracy of GC/LC-MS data processing are drastically improved with uafR. Use of uafR will allow larger datasets to be collected and quickly identified, as well as enable backlogs of collected data to be processed.

Technical Abstract: Chemical information has become increasingly ubiquitous and has outstripped the pace of analysis and interpretation. We have developed an R package, uafR, that automates a grueling manual annotation process for gas/liquid-chromatography coupled mass spectrometry (GC/LC-MS) data and allows anyone interested in chemical comparisons to quickly perform advanced structural similarity matches. Our streamlined methods for data processing and advanced analytical pipelines grant anyone with minimal computing experience access to an exhaustive source of chemical information. Interpretations can now be done at a fraction of the time, cost, and effort it would typically take using a standard chemical ecology data analysis pipeline. The package was tested in two experimental contexts: (1) A dataset of purified internal standards, which showed our algorithms correctly identified the known compounds with R2 values ranging from 0.827–0.999 along concentrations ranging from 1 × 10-5 to 1 × 103 ng/µl, (2) A large, previously published dataset, where the number and types of compounds identified were comparable (or identical) to those identified with the traditional manual peak annotation process, and NMDS analysis of the compounds produced the same pattern of significance as in the original study. Both the speed and accuracy of GC/LC-MS data processing are drastically improved with uafR. Use of uafR will allow larger datasets to be collected and quickly identified, as well as enable backlogs of collected data to be processed. This is critical as we enter the era metabolomics and volatilomes. This package was developed to advance collective understanding of chemical data and is applicable to any research that benefits from GC/LC-MS analysis.