Skip to main content
ARS Home » Pacific West Area » Corvallis, Oregon » Horticultural Crops Disease and Pest Management Research Unit » Research » Publications at this Location » Publication #350358

Research Project: Integrated Disease Management of Exotic and Emerging Plant Diseases of Horticultural Crops

Location: Horticultural Crops Disease and Pest Management Research Unit

Title: Taxa: An R package implementing data standards and methods for manipulation of taxonomic data

Author
item FOSTER, ZACHARY - Oregon State University
item CHAMBERLAIN, S - University Of California
item Grunwald, Niklaus - Nik

Submitted to: F1000Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/5/2018
Publication Date: 3/5/2018
Citation: Foster, Z.S., Chamberlain, S., Grunwald, N.J. 2018. Taxa: An R package implementing data standards and methods for manipulation of taxonomic data. F1000Research. 7:272. https://doi.org/10.12688/f1000research.14013.1.
DOI: https://doi.org/10.12688/f1000research.14013.1

Interpretive Summary: R is a computer and statistical language widely used in science. We developed new tools in R and released these as the taxa R package. Taxa provides a set of tools for defining and manipulating taxonomic data. Taxa is currently being used by the metacoder and taxize packages which provide broadly useful functionality that we hope will speed adoption by users and developers.

Technical Abstract: The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data this information is encoded in many different ways and the hierarchical nature of taxonomic classifications make it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and lineages) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages which provide broadly useful functionality that we hope will speed adoption by users and developers.