Author
SCHULZ, KATJA - Smithsonian Institute | |
HAMMOCK, JENNIFER - Smithsonian Institute | |
Parr, Cynthia - Cyndy |
Submitted to: Ecological Society of America (ESA)
Publication Type: Proceedings Publication Acceptance Date: 5/21/2015 Publication Date: N/A Citation: N/A Interpretive Summary: Easy access to large amounts of data about the distribution, ecology, life history, physiology, and morphology of species has the potential to transform biodiversity research. However, most of the data generated so far are not easily integrated or repurposed due to a lack of standardizaiton in how scientists talk about the characteristics of organisms, how they describe the context of their observations, and how they document the methods with which the data were collected. TraitBank (eol.org/traitbank) addresses this impediment by linking information aggregated from diverse sources to community-developed ontologies and controlled volcabularies. These post hoc semantic annotations improve the discoverabiltiy and queriability of the data and provide interoperability and other semantic resources. TraitBank collects information about the characteristics of animals, plants, fungi, and microbes. It covers many different topics and includes species traits that have been identified as Essential Biodiversity Variables by the Group on Earth Observations Biodiversity Observation Network (GEO BON), e.g., measures of body size, phenology, migratory behavior, and physiological traits like thermal tolerance and metabolic rate. Data can be downloaded via csv files or a JSON-DL service. Reuse and redistribution of data with attribution to the original sources is encouraged. Technical Abstract: TraitBank currently serves over 11 million measurements and facts for more than 1.7 million taxa. These data are mobilized from major biodiversity information systems (e.g., International Union for Conservation of Nature, Ocean Biogeographic Information System, Paleobiology Database), literature supplements (e.g., Dryad Digital Repository, Ecological Archives, Pangaea), label data from natural history collections, and legacy/unpublished data sets. Each record is accompanied by available metadata on provenance, measuements methods, sampling parameters, etc. TraitBank organizes distributed knowledge from heterogeneous sources into a lightweight, scalable semantic framework that supports retrieval and reuse for a varity of applications, ranging from large-scale synthetic analyses of biodiversity to linked data products and hands-on data science in the classroom. It complements taxon or subject-specific knowledge management systems by filling gaps (both in taxonomic and trait space), by recruiting new types of data (e.g., from text-mining, citizen-science, and specimen data digitization efforts) and by integrating knowledge across the entire tree of life and multiple scientific domains. The emerging semantic framework will facilitate data discovery, support queries across data sets, and advance data integration and exchange among projects, thus making more biodiversity data available for use in scientific and policy-oriented applications. |