4.6 Article

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis

期刊

AMERICAN JOURNAL OF EPIDEMIOLOGY
卷 190, 期 9, 页码 1830-1840

出版社

OXFORD UNIV PRESS INC
DOI: 10.1093/aje/kwab010

关键词

machine learning; measurement error; misclassification; noise; quantitative bias analysis; random forests

资金

  1. National Institute of Mental Health [1R01 MH109507, 1R01 MH110453-01A1]
  2. US National Library of Medicine [R01LM013049]

向作者/读者索取更多资源

This study evaluated the impact of measurement error on random-forest model performance and variable importance, finding that measurement error in the data used for constructing random forests can distort model performance and variable importance measures, but that bias analysis can recover the correct results.
Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据