4.6 Article

Feature selection based on self-information and entropy measures for incomplete neighborhood decision systems

期刊

COMPLEX & INTELLIGENT SYSTEMS
卷 9, 期 2, 页码 1773-1790

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s40747-022-00882-8

关键词

Feature selection; Neighborhood multi-granulation rough sets; Self-information; Neighborhood tolerance conditional entropy; Incomplete neighborhood decision system

向作者/读者索取更多资源

This paper introduces a feature selection method based on neighborhood multi-granulation rough sets (NMRS) to address the information loss issue. A novel measure, named PTSIJE, is proposed for uncertain feature selection in incomplete datasets. Experimental results demonstrate the effectiveness of the proposed method.
For incomplete datasets with mixed numerical and symbolic features, feature selection based on neighborhood multi-granulation rough sets (NMRS) is developing rapidly. However, its evaluation function only considers the information contained in the lower approximation of the neighborhood decision, which easily leads to the loss of some information. To solve this problem, we construct a novel NMRS-based uncertain measure for feature selection, named neighborhood multi-granulation self-information-based pessimistic neighborhood multi-granulation tolerance joint entropy (PTSIJE), which can be used to incomplete neighborhood decision systems. First, from the algebra view, four kinds of neighborhood multi-granulation self-information measures of decision variables are proposed by using the upper and lower approximations of NMRS. We discuss the related properties, and find the fourth measure-lenient neighborhood multi-granulation self-information measure (NMSI) has better classification performance. Then, inspired by the algebra and information views simultaneously, a feature selection method based on PTSIJE is proposed. Finally, the Fisher score method is used to delete uncorrelated features to reduce the computational complexity for high-dimensional gene datasets, and a heuristic feature selection algorithm is raised to improve classification performance for mixed and incomplete datasets. Experimental results on 11 datasets show that our method selects fewer features and has higher classification accuracy than related methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据