☆ 4.6 Article

A Discrete Hidden Markov Model for SMS Spam Detection

APPLIED SCIENCES-BASEL (2020)

期刊

APPLIED SCIENCES-BASEL

卷 10, 期 14, 页码 -

出版社

MDPI

DOI: 10.3390/app10145011

关键词

short messaging service (SMS); spam detection; hidden Markov model (HMM); text classification; natural language processing (NLP)

类别

Chemistry, Multidisciplinary Engineering, Multidisciplinary Materials Science, Multidisciplinary Physics, Applied

资金

Soft Engineering of Key Subjects Construction in Shanghai Polytechnic University [xxkzd1604]
US National Science Foundation [CNS-1801811]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Many machine learning methods have been applied for short messaging service (SMS) spam detection, including traditional methods such as naive Bayes (NB), vector space model (VSM), and support vector machine (SVM), and novel methods such as long short-term memory (LSTM) and the convolutional neural network (CNN). These methods are based on the well-known bag of words (BoW) model, which assumes documents are unordered collection of words. This assumption overlooks an important piece of information, i.e., word order. Moreover, the term frequency, which counts the number of occurrences of each word in SMS, is unable to distinguish the importance of words, due to the length limitation of SMS. This paper proposes a new method based on the discrete hidden Markov model (HMM) to use the word order information and to solve the low term frequency issue in SMS spam detection. The popularly adopted SMS spam dataset from the UCI machine learning repository is used for performance analysis of the proposed HMM method. The overall performance is compatible with deep learning by employing CNN and LSTM models. A Chinese SMS spam dataset with 2000 messages is used for further performance evaluation. Experiments show that the proposed HMM method is not language-sensitive and can identify spam with high accuracy on both datasets.

A Discrete Hidden Markov Model for SMS Spam Detection

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Discrete Hidden Markov Model for SMS Spam Detection

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文