☆ 4.6 Article

Imputation of missing data in life-history trait datasets: which approach performs the best?

METHODS IN ECOLOGY AND EVOLUTION (2014)

期刊

METHODS IN ECOLOGY AND EVOLUTION

卷 5, 期 9, 页码 961-970

出版社

WILEY

DOI: 10.1111/2041-210X.12232

关键词

Phylopars; missForest; kNN; multivariate imputation by chained equations; phylogeny; carnivores; root mean squared error; body mass; longevity

类别

Ecology

资金

Capes [PVE 018/2012]
CNPq [302776/2012-5]
NSF [DEB-1146198, DEB-1136586, 1136588]
CAPES/Science [PVE 018/2012]
Direct For Biological Sciences
Division Of Environmental Biology [1136705] Funding Source: National Science Foundation
Division Of Environmental Biology
Direct For Biological Sciences [1136586, 1146198] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Despite efforts in data collection, missing values are commonplace in life-history trait databases. Because these values typically are not missing randomly, the common practice of removing missing data not only reduces sample size, but also introduces bias that can lead to incorrect conclusions. Imputing missing values is a potential solution to this problem. Here, we evaluate the performance of four approaches for estimating missing values in trait databases (K-nearest neighbour (kNN), multivariate imputation by chained equations (mice), missForest and Phylopars), and test whether imputed datasets retain underlying allometric relationships among traits. Starting with a nearly complete trait dataset on the mammalian order Carnivora (using four traits), we artificially removed values so that the percent of missing values ranged from 10% to 80%. Using the original values as a reference, we assessed imputation performance using normalized root mean squared error. We also evaluated whether including phylogenetic information improved imputation performance in kNN, mice, and missForest (it is a required input in Phylopars). Finally, we evaluated the extent to which the allometric relationship between two traits (body mass and longevity) was conserved for imputed datasets by looking at the difference (bias) between the slope of the original and the imputed datasets or datasets with missing values removed. Three of the tested approaches (mice, missForest and Phylopars), resulted in qualitatively equivalent imputation performance, and all had significantly lower errors than kNN. Adding phylogenetic information into the imputation algorithms improved estimation of missing values for all tested traits. The allometric relationship between body mass and longevity was conserved when up to 60% of data were missing, either with or without phylogenetic information, depending on the approach. This relationship was less biased in imputed datasets compared to datasets with missing values removed, especially when more than 30% of values were missing. Imputations provide valuable alternatives to removing missing observations in trait databases as they produce low errors and retain relationships among traits. Although we must continue to prioritize data collection on species traits, imputations can provide a valuable solution for conducting macroecological and evolutionary studies using life-history trait databases.

Imputation of missing data in life-history trait datasets: which approach performs the best?

期刊

METHODS IN ECOLOGY AND EVOLUTION

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Imputation of missing data in life-history trait datasets: which approach performs the best?

期刊

METHODS IN ECOLOGY AND EVOLUTION

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文