4.7 Article

RHDSI: A novel dimensionality reduction based algorithm on high dimensional feature selection with interactions

期刊

INFORMATION SCIENCES
卷 574, 期 -, 页码 590-605

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.06.096

关键词

Interaction terms; High-dimensional data; Feature selection; Dimensionality reduction; Machine learning; Regression

资金

  1. Natural Sciences and Engineering Research Council of Canada (NSERC) [RGPIN-2017-06672]
  2. Prostate Cancer Canada

向作者/读者索取更多资源

RHDSI is a novel feature selection method that integrates dimensionality reduction and machine learning, capable of handling high-dimensional data with interaction terms. It performs feature selection in three steps, including coarse feature selection, unsupervised statistical learning-based feature refinement, and supervised statistical learning-based final feature selection with interactions. RHDSI demonstrates better or comparable performance to standard feature selection algorithms in simulated data and real studies.
Classical statistical learning techniques struggle to perform feature selection in high dimensional data that includes interaction effects i.e., when independent feature/s influence the effect of another feature on study outcome. Methods like penalized regression and sparse partial least squares regression can help, but penalization restricts the handling of interaction terms. This study proposes a novel Dimensionality Reduction based algorithm on High Dimensional feature Selection with Interactions (RHDSI), a new feature selection method that integrates dimensionality reduction and machine learning. The method can handle high-dimensional data, incorporate interaction terms and perform statistically-interpretable feature selection; and enables existing classical statistical techniques to work on high-dimensional data. RHDSI performs feature selection in three steps. The first step is a coarse feature selection through dimensionality reduction and statistical modeling on multiple resampled datasets and features, along with their interaction terms. The second step uses pooled results for unsupervised statistical learning-based feature refinement. Finally, supervised statistical learning-based feature selection is performed on the refined feature set to identify the final features with interactions. We evaluate the performance of this algorithm on simulated data and real studies. RHDSI shows better or par performance compared to standard feature selection algorithms like LASSO, subset selection, and sparse PLS. (c) 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据