4.7 Article

Effects of ignoring survey design information for data reuse

期刊

ECOLOGICAL APPLICATIONS
卷 31, 期 6, 页码 -

出版社

WILEY
DOI: 10.1002/eap.2360

关键词

bias; data; database; findable; accessible; interoperable; reusable data; Horvitz-Thompson estimator; inclusion probability; model; population density estimate; reuse; survey design

资金

  1. Australian Government's National Environmental Science Program
  2. Academy of Finland [317255]
  3. Academy of Finland (AKA) [317255, 317255] Funding Source: Academy of Finland (AKA)

向作者/读者索取更多资源

This study examines the effectiveness of data reuse in ecological research and finds that ignoring survey designs can lead to up to 250% bias in density estimates, which cannot be reduced by adding more data. It suggests using appropriate estimators or models to mitigate this bias.
Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: Are aggregated databases currently providing the right information to enable effective and unbiased reuse? We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据