4.7 Article

A new semantic-based feature selection method for spam filtering

期刊

APPLIED SOFT COMPUTING
卷 76, 期 -, 页码 89-104

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2018.12.008

关键词

Feature selection methods; Text mining; Spam filtering; e-mail; Classification; Machine learning

资金

  1. Xunta de Galicia [ED481B 2017/018]
  2. Conselleria de Cultura, Educacion e Ordenacion Universitaria (Xunta de Galicia)
  3. FEDER (European Union)
  4. Spanish Ministry of Science and Innovation [MTM2017-89422-P]
  5. Xunta de Galicia (Spain) [ED431C2016-040]
  6. SMEIC/SRA/ERDF [TIN2017-84658-C2-1-R]

向作者/读者索取更多资源

The Internet emerged as a powerful infrastructure for the worldwide communication and interaction of people. Some unethical uses of this technology (for instance spam or viruses) generated challenges in the development of mechanisms to guarantee an affordable and secure experience concerning its usage. This study deals with the massive delivery of unwanted content or advertising campaigns without the accordance of target users (also known as spam). Currently, words (tokens) are selected by using feature selection schemes; they are then used to create feature vectors for training different Machine Learning (ML) approaches. This study introduces a new feature selection method able to take advantage of a semantic ontology to group words into topics and use them to build feature vectors. To this end, we have compared the performance of nine well-known ML approaches in conjunction with (i) Information Gain, the most popular feature selection method in the spam-filtering domain and (ii) Latent Dirichlet Allocation, a generative statistical model that allows sets of observations to be explained by unobserved groups that describe why some parts of the data are similar, and (iii) our semantic-based feature selection proposal. Results have shown the suitability and additional benefits of topic-driven methods to develop and deploy high-performance spam filters. (C) 2018 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据