4.7 Article Data Paper

A globally synthesised and flagged bee occurrence dataset and cleaning workflow

Journal

SCIENTIFIC DATA
Volume 10, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41597-023-02626-w

Keywords

-

Ask authors/readers for more resources

This study presents the BeeBDC package and a global bee occurrence dataset, which address the limited availability and reliability of bee species data. By combining and cleaning millions of records, the authors provide users with clean and flagged bee occurrence data, along with the ability to modify filtering parameters.
Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, cleaned and flagged-but-uncleaned. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available