期刊
KNOWLEDGE-BASED SYSTEMS
卷 64, 期 -, 页码 22-31出版社
ELSEVIER
DOI: 10.1016/j.knosys.2014.03.015
关键词
Spam detection; Binary Particle Swarm Optimization; Mutation operator; Feature selection; Wrapper; Premature convergence; Decision tree; Cost matrix
In this paper, we proposed a novel spam detection method that focused on reducing the false positive error of mislabeling nonspam as spam. First, we used the wrapper-based feature selection method to extract crucial features. Second, the decision tree was chosen as the classifier model with C4.5 as the training algorithm. Third, the cost matrix was introduced to give different weights to two error types, i.e., the false positive and the false negative errors. We define the weight parameter as a to adjust the relative importance of the two error types. Fourth, K-fold cross validation was employed to reduce out-of-sample error. Finally, the binary PSO with mutation operator (MBPSO) was used as the subset search strategy. Our experimental dataset contains 6000 emails, which were collected during the year of 2012. We conducted a Kolmogorov-Smirnov hypothesis test on the capital-run-length related features and found that all the p values were less than 0.001. Afterwards, we found alpha = 7 was the most appropriate in our model. Among seven meta-heuristic algorithms, we demonstrated the MBPSO is superior to GA, RSA, PSO, and BPSO in terms of classification performance. The sensitivity, specificity, and accuracy of the decision tree with feature selection by MBPSO were 91.02%, 97.51%, and 94.27%, respectively. We also compared the MBPSO with conventional feature selection methods such as SFS and SBS. The results showed that the MBPSO performs better than SFS and SBS. We also demonstrated that wrappers are more effective than filters with regard to classification performance indexes. It was clearly shown that the proposed method is effective, and it can reduce the false positive error without compromising the sensitivity and accuracy values. (c) 2014 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据