4.5 Article

Variable selection for Naive Bayes classification

期刊

COMPUTERS & OPERATIONS RESEARCH
卷 135, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.cor.2021.105456

关键词

Clustering; Conditional independence; Dependence measures; Heuristics; Probabilistic classification; Cost-sensitive classification

资金

  1. Ministerio de Economia y Competitividad, Spain [MTM2015-65915-R]
  2. Ministerio de Ciencia, Innovacion y Universidades, Spain [PID2019-110886RB-I00]
  3. Junta de Andalucia, Spain [FQM-329, P18-FR-2369]
  4. Universidad de Cadiz, Spain [PR2019-029]
  5. Fundacion BBVA
  6. EC H2020 MSCA RISE NeEDS Project [822214]

向作者/读者索取更多资源

The proposed sparse Naive Bayes classifier takes into account the correlation structure of features, allows for flexible selection of performance measures, and includes performance constraints for groups of higher interest. This approach leads to competitive results in terms of accuracy, sparsity, and running times for balanced datasets, while also achieving a better compromise between classification rates for different classes in unbalanced datasets.
The Naive Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naive Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the results as well as slow down the method's execution. In this paper we propose a sparse version of the Naive Bayes classifier that is characterized by three properties. First, the sparsity is achieved taking into account the correlation structure of the covariates. Second, different performance measures can be used to guide the selection of features. Third, performance constraints on groups of higher interest can be included. Our proposal leads to a smart search, which yields competitive running times, whereas the flexibility in terms of performance measure for classification is integrated. Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Naive Bayes obtains competitive results regarding accuracy, sparsity and running times for balanced datasets. In the case of datasets with unbalanced (or with different importance) classes, a better compromise between classification rates for the different classes is achieved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据