Skip to main content
ARS Home » Southeast Area » Stoneville, Mississippi » Pollinator Health in Southern Crop Ecosystems Research » Research » Publications at this Location » Publication #400522

Research Project: Ecological Assessment and Mitigation Strategies to Reduce the Risks of Bees to Stressors in Southern Crop Ecosystems

Location: Pollinator Health in Southern Crop Ecosystems Research

Title: A globally-synthesized and flagged bee occurrence dataset and cleaning workflow

Author
item DOREY, JAMES - Flinders University
item FISCHER, ERICA - King'S College
item CHESSHIRE, PAIGE - Northern Arizona University
item BOLANOS, ANGELA - The National Autonomous University Of Mexico
item ORIELLY, ROBERT - Flinders University
item BOSSERT, SILAS - Washington State University
item COLLINS, SHANNON - University Of North Texas
item LICHTENBERG, ELINOR - University Of North Texas
item TUCKER, ERIKA - Biodiversity Outreach Network
item SMITH-PARDO, ALLAN - US Department Of Agriculture (USDA)
item FALCON-BRINDIS, ARMANDO - University Of Kentucky
item GUEVARA, DIEGO - National University Of Colombia
item RIBEIRO, BRUNO - Retired Non ARS Employee
item E DE PEDRO, DIEGO - Centro De Investigacion Cientifica Y De Educacion Superior De Ensenada
item PICKERING, JOHN - University Of Oklahoma
item HUNG, JAMES - University Of Oklahoma
item Parys, Katherine
item McCabe, Lindsie
item ROGAN, MATTHEW - Yale University
item MINCKLEY, ROBERT - University Of Rochester
item JE VELZCO, SANTIAGO - National University Of Colombia
item Griswold, Terry
item ZARILLO, TRACY - University Of Connecticut
item JETZ, WALTER - Yale University
item VANESA SICA, YANINA - Yale University
item ORR, MICHAEL - Stuttgart State Museum Of Natural History
item MELISSA GUZMAN, LAURA - Pontificia Universidad Javeriana
item ASCHER, JOHN - National University Of Singapore
item HUGHES, ALICE - University Of Hong Kong
item COBB, NEIL - Biodiversity Outreach Network

Submitted to: Scientific Data - Nature
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/9/2023
Publication Date: 11/2/2023
Citation: Dorey, J.B., Fischer, E.E., Chesshire, P.R., Nava-Bolanos, A.N., O'Rielly, R.L., Bossert, S., Collins, S.M., Lichtenberg, E.M., Tucker, E.M., Smith-Pardo, A., Falcon-Brindis, A., Guevara, D.A., Ribeiro, B., De Pedro, D., Pickering, J., Hung, K.J., Parys, K.A., McCabe, L.M., Rogan, M.S., Minckley, R.L., Velzco, J.E., S., Griswold, T.L., Zarillo, T.A., Jetz, W., Sica, Y.V., Orr, M.C., Guzman, L.M., Ascher, J.S., Hughes, A.C., Cobb, N.S. 2023. A globally synthesized and flagged bee occurrence dataset and cleaning workflow. Scientific Data - Nature. 10(747):2023. https://doi.org/10.1038/s41597-023-02626-w.
DOI: https://doi.org/10.1038/s41597-023-02626-w

Interpretive Summary: Occurrence data are foundational for scientific research and communication, yet their reliable preparation represents a major accessibility issue. We present a new global bee occurrence dataset and cleaning workflow to overcome this issue. Bee occurrence data were merged, standardised, and duplicates were identified from major data repositories (GBIF, SCAN-bugs, iDigBio, USGS, and ALA) and private datasets using a reproducible R-workflow. We undertook data carpentry to align the naming of occurrence taxonomy (using the curated Discover Life website), country, and collection date and (ii) data flagging for a series of potential quality issues. Summary figures are used to visualise these complex data outputs. These data are provided in two formats, “completely-cleaned” and “flagged-but-uncleaned”. The script and r-markdown used to merge, edit, flag, and filter the data are provided. These datasets and associated scripts will be improved and updated periodically.

Technical Abstract: Occurrence data are foundational for scientific research and communication, yet their reliable preparation represents a major accessibility issue. We present a new global bee occurrence dataset and cleaning workflow to overcome this issue. Bee occurrence data were merged, standardised, and duplicates were identified from major data repositories (GBIF, SCAN-bugs, iDigBio, USGS, and ALA) and private datasets using a reproducible R-workflow. We undertook data (i) “carpentry” to harmonise occurrence taxonomy (using the curated Discover Life website), country, and collection date and (ii) data “flagging” for a series of potential quality issues. Summary figures are used to visualise these complex data outputs. These data are provided in two formats, “completely-cleaned” and “flagged-but-uncleaned”. The script and r-markdown used to merge, edit, flag, and filter the data are provided. These datasets and associated scripts will be improved and updated periodically.