4.7 Article

ecocomDP: A flexible data design pattern for ecological community survey data

Journal

ECOLOGICAL INFORMATICS
Volume 64, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.ecoinf.2021.101374

Keywords

Ecological community survey; Data harmonization; Workflow; ecocomDP; NEON; LTER; LTREB

Categories

Funding

  1. National Science Foundation [1931143, 1931174, 1926568, 1545288, 1929393]
  2. National Science Foundation through the NEON Program
  3. National Science Foundation through the Long Term Ecological Research (LTER) Program
  4. Direct For Biological Sciences [1545288, 1929393] Funding Source: National Science Foundation
  5. Direct For Biological Sciences
  6. Division Of Environmental Biology [1926568] Funding Source: National Science Foundation
  7. Division Of Environmental Biology [1929393, 1545288] Funding Source: National Science Foundation
  8. Div Of Biological Infrastructure
  9. Direct For Biological Sciences [1931174, 1931143] Funding Source: National Science Foundation

Ask authors/readers for more resources

The idea of harmonizing data has been around for decades, but faces challenges when dealing with studies where sampling protocols vary greatly and complex environmental conditions are involved. A collaborative approach with an environmental data repository and a national observatory is discussed to create a decentralized model for reformatting data without altering original data. This approach aims to contribute subsets of available data to different analysis-ready data preparation efforts while ensuring metadata retention and programmatic data access.
The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards both locally and globally have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of longterm ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data 'silo' but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data's discovery and use.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available