☆ 4.5 Article

Nonparametric variable importance assessment using machine learning techniques

BIOMETRICS (2021)

期刊

BIOMETRICS

卷 77, 期 1, 页码 9-22

出版社

WILEY

DOI: 10.1111/biom.13392

关键词

machine learning; nonparametric R-2; statistical inference; targeted learning; variable importance

类别

Biology Mathematical & Computational Biology Statistics & Probability

资金

National Institute of Allergy and Infectious Diseases [UM1AI068635, F31AI140836, DP5OD019820, R01AI029168]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this work, a variable importance measure that can be used with any regression technique, regardless of the technique used, is studied. The study discusses how to flexibly estimate the importance of a single feature or group of features using machine learning techniques. Through simulations and a case study, it is shown that the proposal has good practical operating characteristics and effects.

In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data-generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.

Nonparametric variable importance assessment using machine learning techniques

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Nonparametric variable importance assessment using machine learning techniques

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文