☆ 4.6 Article

A weighted feature enhanced Hidden Markov Model for spam SMS filtering

NEUROCOMPUTING (2021)

期刊

NEUROCOMPUTING

卷 444, 期 -, 页码 48-58

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2021.02.075

关键词

Hidden Markov Model (HMM); Short Messaging Service (SMS); Spam filtering; Weighted features; Text classification

类别

Computer Science, Artificial Intelligence

资金

Soft Engineering of Key Subjects Construction at Shanghai Polytechnic University [xxkzd1604]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Short Message Service (SMS) is commonly used by people in daily life, but it is also misused by spammers. Researchers have developed rule-based and content-based filtering techniques, as well as machine learning methods, to combat spam messages. The weighted feature enhanced Hidden Markov Model (HMM) has shown significant improvement in filtering accuracy and speed.

Short message service (SMS) is a most favored communication service people use in daily life. However, this service is being misused by spammers. Rule based systems (RBS) and content based filtering (CBF) techniques have been developed to filter out spam messages. New rules can be easily added into RBS, but the throughput usually reduces as the rules increase. The bag-of-words (BoW) assumption based CBF techniques ignore the word order, which use machine learning methods to extract features from SMS message body according to word frequency and distribution. Striving to improve performance, researchers developed hybrid models that made algorithms ever-more complex. In addition, frequently conducting the time consuming models training and deployment forces the anti-spam industry still rely mainly on rule-based systems with unsolved throughput issue. A discrete Hidden Markov Model (HMM) was proposed in our previous study to address these issues, and the HMM method achieved a comparable performance to the deep learning methods. To further improve the performance of HMM method, we propose a new approach to weight and label words in SMS for formatting the observation sequence in HMM method. The weighted feature enhanced HMM achieves higher accuracy, and much faster training and filtering speed for meeting the anti-spam industry requirement. The performance comparison with other machine learning methods is conducted on the same open respiratory data set maintained by University of California, Irvine (UCI). Experimental results show that the weighted features enhanced HMM outperforms the LSTM (long short-term memory model) and close to CNN (convolutional neural network) in terms of classification accuracy. In addition, a Chinese SMS data set is used to further validate filtering accuracy and filtering speed. (c) 2021 Elsevier B.V. All rights reserved.

A weighted feature enhanced Hidden Markov Model for spam SMS filtering

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A weighted feature enhanced Hidden Markov Model for spam SMS filtering

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文