Journal
ECOLOGICAL INFORMATICS
Volume 64, Issue -, Pages -Publisher
ELSEVIER
DOI: 10.1016/j.ecoinf.2021.101374
Keywords
Ecological community survey; Data harmonization; Workflow; ecocomDP; NEON; LTER; LTREB
Categories
Funding
- National Science Foundation [1931143, 1931174, 1926568, 1545288, 1929393]
- National Science Foundation through the NEON Program
- National Science Foundation through the Long Term Ecological Research (LTER) Program
- Direct For Biological Sciences [1545288, 1929393] Funding Source: National Science Foundation
- Direct For Biological Sciences
- Division Of Environmental Biology [1926568] Funding Source: National Science Foundation
- Division Of Environmental Biology [1929393, 1545288] Funding Source: National Science Foundation
- Div Of Biological Infrastructure
- Direct For Biological Sciences [1931174, 1931143] Funding Source: National Science Foundation
Ask authors/readers for more resources
The idea of harmonizing data has been around for decades, but faces challenges when dealing with studies where sampling protocols vary greatly and complex environmental conditions are involved. A collaborative approach with an environmental data repository and a national observatory is discussed to create a decentralized model for reformatting data without altering original data. This approach aims to contribute subsets of available data to different analysis-ready data preparation efforts while ensuring metadata retention and programmatic data access.
The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards both locally and globally have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of longterm ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data 'silo' but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data's discovery and use.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available