4.7 Review

Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

期刊

GIGASCIENCE
卷 4, 期 -, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1186/s13742-015-0067-4

关键词

LAGOS; Integrated database; Data harmonization; Database documentation; Data reuse; Data sharing; Ecoinformatics; Macrosystems ecology; Landscape limnology; Water quality

资金

  1. National Science Foundation MacroSystems Biology Program in the Emerging Frontiers Division of the Biological Sciences Directorate [EF-1065786, EF-1065649, EF-1065818]
  2. USDA National Institute of Food and Agriculture, Hatch project [176820]
  3. STRIVE Programme from the Environmental Protection Agency, Ireland [2011-W-FS-7]
  4. Environmental Protection Agency Ireland (EPA) [2011-W-FS-7] Funding Source: Environmental Protection Agency Ireland (EPA)
  5. Direct For Biological Sciences
  6. Div Of Biological Infrastructure [1401954] Funding Source: National Science Foundation
  7. Direct For Biological Sciences
  8. Emerging Frontiers [1065786] Funding Source: National Science Foundation
  9. Direct For Biological Sciences
  10. Emerging Frontiers [1065818] Funding Source: National Science Foundation

向作者/读者索取更多资源

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (similar to 50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from similar to 100 individual datasets for a subset of lakes in the study extent (similar to 10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据