4.6 Review

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

期刊

ARTIFICIAL INTELLIGENCE REVIEW
卷 56, 期 2, 页码 1145-1173

出版社

SPRINGER
DOI: 10.1007/s10462-022-10195-4

关键词

Spam email detection; Dataset shift; Adversarial machine learning; Spammer strategies; Feature selection

向作者/读者索取更多资源

Spam emails are no longer just annoying advertisements, but a growing source of scams and attacks. While machine learning-based spam filters have shown high performance in academic journals, users still face fraudulent and malicious emails. The challenges in this field are the dynamic nature of the environment and the presence of spammers as adversaries.
Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据