☆ 4.7 Article

Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability

COMPUTERS & INDUSTRIAL ENGINEERING (2022)

期刊

COMPUTERS & INDUSTRIAL ENGINEERING

卷 165, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.cie.2022.107959

关键词

Clustering Analysis; YAC2; Transferability; Machine learning; Sentiment analysis; Spam filtering

类别

Computer Science, Interdisciplinary Applications Engineering, Industrial

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper introduces a relatively simple and transferrable unsupervised approach to text classification for sentiment analysis and spam filtering. By combining a new clustering algorithm with domain transferrable feature engineering, the integrated solution achieves better accuracy than traditional methods and shows transferability across different datasets.

Two notable applications of text classification are sentiment analysis and spam filtering. Traditional machine learning approaches to text classification are often complex, non-transferrable, and require supervision. This paper introduces an unsupervised approach to text classification which is relatively simple and transfers between problem domains, while providing accuracy comparable or better than established alternatives. We present an integrated solution which combines a new clustering algorithm, Yet Another Clustering Algorithm (YAC2), with a domain transferrable feature engineering approach for Twitter sentiment analysis and spam filtering of YouTube comments. We evaluate the effectiveness of this integrated solution for Twitter sentiment analysis using three datasets: Starbucks, Verizon, and Southwest Airlines. YouTube spam filtering is evaluated using four datasets: Psy, LMFAO, Shakira, and Katy Perry. We compare the results with established clustering solutions: KNN, Spectral, and DBSCAN. Our integrated solution performs better than all the alternatives for sentiment analysis. For spam filtering, YAC2 and KNN perform within 1% of each other and far better than Spectral and DBSCAN for all datasets. Additionally, our feature engineering approach improves accuracy compared to using a traditional method, while significantly reducing model dimensionality, matrix sparsity and providing transferability across the datasets tested.

Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability

期刊

COMPUTERS & INDUSTRIAL ENGINEERING

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability

期刊

COMPUTERS & INDUSTRIAL ENGINEERING

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文