Location: Pollinator Health in Southern Crop Ecosystems Research
Title: A globally-synthesized and flagged bee occurrence dataset and cleaning workflowAuthor
![]() |
DOREY, JAMES - Flinders University |
![]() |
CHESSHIRE, PAIGE - Northern Arizona University |
![]() |
BOLANOS, ANGELA - The National Autonomous University Of Mexico |
![]() |
ORIELLY, ROBERT - Flinders University |
![]() |
BOSSERT, SILAS - Washington State University |
![]() |
COLLINS, SHANNON - University Of North Texas |
![]() |
LICHTENBERG, ELINOR - University Of North Texas |
![]() |
TUCKER, ERIKA - Biodiversity Outreach Network |
![]() |
SMITH-PARDO, ALLAN - US Department Of Agriculture (USDA) |
![]() |
FALCON-BRINDIS, ARMANDO - University Of Kentucky |
![]() |
GUEVARA, DIEGO - National University Of Colombia |
![]() |
RIBEIRO, BRUNO - Retired Non ARS Employee |
![]() |
E DE PEDRO, DIEGO - Centro De Investigacion Cientifica Y De Educacion Superior De Ensenada |
![]() |
FISCHER, ERICA - King'S College |
![]() |
PICKERING, JOHN - University Of Oklahoma |
![]() |
HUNG, JAMES - University Of Oklahoma |
![]() |
Parys, Katherine |
![]() |
McCabe, Lindsie |
![]() |
ROGAN, MATTHEW - Yale University |
![]() |
MINCKLEY, ROBERT - University Of Rochester |
![]() |
JE VELZCO, SANTIAGO - National University Of Colombia |
![]() |
Griswold, Terry |
![]() |
ZARILLO, TRACY - University Of Connecticut |
![]() |
JETZ, WALTER - Yale University |
![]() |
VANESA SICA, YANINA - Yale University |
![]() |
ORR, MICHAEL - Stuttgart State Museum Of Natural History |
![]() |
MELISSA GUZMAN, LAURA - Pontificia Universidad Javeriana |
![]() |
ASCHER, JOHN - National University Of Singapore |
![]() |
HUGHES, ALICE - University Of Hong Kong |
![]() |
COBB, NEIL - Biodiversity Outreach Network |
Submitted to: Scientific Data - Nature
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/9/2023 Publication Date: 11/2/2023 Citation: Dorey, J.B., Chesshire, P., Bolanos, A.N., Orielly, R., Bossert, S., Collins, S., Lichtenberg, E.M., Tucker, E., Smith-Pardo, A., Falcon-Brindis, A., Guevara, D.A., Ribeiro, B., E De Pedro, D., Fischer, E.E., Pickering, J., Hung, J., Parys, K.A., Mccabe, L.M., Rogan, M., Minckley, R., Je Velzco, S., Griswold, T.L., Zarillo, T., Jetz, W., Vanesa Sica, Y., Orr, M.C., Melissa Guzman, L., Ascher, J., Hughes, A., Cobb, N. 2023. A globally-synthesized and flagged bee occurrence dataset and cleaning workflow. Scientific Data - Nature. https://doi.org/10.1038/s41597-023-02626-w. DOI: https://doi.org/10.1038/s41597-023-02626-w Interpretive Summary: Occurrence data are foundational for scientific research and communication, yet their reliable preparation represents a major accessibility issue. We present a new global bee occurrence dataset and cleaning workflow to overcome this issue. Bee occurrence data were merged, standardised, and duplicates were identified from major data repositories (GBIF, SCAN-bugs, iDigBio, USGS, and ALA) and private datasets using a reproducible R-workflow. We undertook data carpentry to align the naming of occurrence taxonomy (using the curated Discover Life website), country, and collection date and (ii) data flagging for a series of potential quality issues. Summary figures are used to visualise these complex data outputs. These data are provided in two formats, “completely-cleaned” and “flagged-but-uncleaned”. The script and r-markdown used to merge, edit, flag, and filter the data are provided. These datasets and associated scripts will be improved and updated periodically. Technical Abstract: Occurrence data are foundational for scientific research and communication, yet their reliable preparation represents a major accessibility issue. We present a new global bee occurrence dataset and cleaning workflow to overcome this issue. Bee occurrence data were merged, standardised, and duplicates were identified from major data repositories (GBIF, SCAN-bugs, iDigBio, USGS, and ALA) and private datasets using a reproducible R-workflow. We undertook data (i) “carpentry” to harmonise occurrence taxonomy (using the curated Discover Life website), country, and collection date and (ii) data “flagging” for a series of potential quality issues. Summary figures are used to visualise these complex data outputs. These data are provided in two formats, “completely-cleaned” and “flagged-but-uncleaned”. The script and r-markdown used to merge, edit, flag, and filter the data are provided. These datasets and associated scripts will be improved and updated periodically. |