4.6 Article

A Novel Stream Clustering Framework for Spam Detection in Twitter

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSS.2019.2910818

Keywords

Clustering classification; DenStream; incremental Naive Bayes (INB); spam detection; stream clustering

Ask authors/readers for more resources

Stream clustering methods have been repeatedly used for spam filtering in order to categorize input messages/tweets into spam and nonspam clusters. These methods assume each cluster contains a number of neighbor small (micro) clusters, where each microcluster has a symmetric distribution. Nonetheless, this assumption is not necessarily correct and big microclusters might have asymmetric distribution. To enhance the assigning accuracy of former methods in their online phase, we suggest replacing the Euclidean distance by a set of classifiers in order to assign incoming samples to the most relative microcluster with arbitrary distribution. Here, a set of incremental Naive Bayes (INB) classifier is trained for microclusters whose population exceeds a threshold. These INBs can capture the mean and boundary of microclusters, while the Euclidean distance just considers the mean of clusters and acts inaccurate for asymmetric big microclusters. In this paper, DenStream was promoted by the proposed framework, called here as INB-DenStream. To show the effectiveness of INB-DenStream, state-of-the-art methods such as DenStream, StreamKM++, and CluStream were applied to the Twitter datasets and their performance was determined in terms of purity, general precision, general recall, F1 measure, parameter sensitivity, and computational complexity. The compared results implied the superiority of our method to the rivals in almost the datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available