4.5 Article

Nonparametric variable importance assessment using machine learning techniques

期刊

BIOMETRICS
卷 77, 期 1, 页码 9-22

出版社

WILEY
DOI: 10.1111/biom.13392

关键词

machine learning; nonparametric R-2; statistical inference; targeted learning; variable importance

资金

  1. National Institute of Allergy and Infectious Diseases [UM1AI068635, F31AI140836, DP5OD019820, R01AI029168]

向作者/读者索取更多资源

In this work, a variable importance measure that can be used with any regression technique, regardless of the technique used, is studied. The study discusses how to flexibly estimate the importance of a single feature or group of features using machine learning techniques. Through simulations and a case study, it is shown that the proposal has good practical operating characteristics and effects.
In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data-generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据