4.5 Article

Co-Detection of crowdturfing microblogs and spammers in online social networks

Journal

Publisher

SPRINGER
DOI: 10.1007/s11280-019-00727-4

Keywords

Crowdsourcing; Spammer detection; Semi-supervised learning; Online social networks

Funding

  1. National Key RAMP
  2. D Program of China [2017YFB1003000]
  3. National Natural Science Foundation of China [61972087, 61772133, 61472081, 61402104]
  4. Jiangsu Provincial Key Project [BE2018706]
  5. Key Laboratory of Computer Network Technology of Jiangsu Province
  6. Jiangsu Provincial Key Laboratory of Network and Information Security [BM2003201]
  7. Key Laboratory of Computer Network and Information Integration of Ministry of Education of China [93K-9]

Ask authors/readers for more resources

The rise of online crowdsourcing services has prompted an evolution from traditional spamming accounts, which spread unwanted advertisements and fraudulent content, into novel spammers that resemble those of normal users. Prior research has mainly focused on machine accounts and spams separately, but characteristics of new types of spammers and spamming make it difficult for traditional methods to perform well. In this paper, we integrate the study of these new types of spammers with the study of crowdturfing microblogs, investigating the mechanism of crowdsourcing and the close relationship between crowdturfing spammers and microblogs in order to detect new types of spammers and spams more precisely. We propose a novel semi-supervised learning framework for co-detecting crowdturfing microblogs and spammers by comprehensively modeling user behavior, message content, and users' following and retweeting networks. In order to meet the challenge of sparsely labeled datasets, we design an elaborate co-detection target optimal function to minimize empirical error and to permit the dissemination of sparse labels to unlabeled samples. The advantage of this framework is threefold. First, through a deep-level mining of new-type spammers, we aggregate a number of new-found features that can help us make significant distinctions between normal users and new-type spammers. Secondly, by modeling both following networks and retweeting networks, we characterize the essence of the crowdsourcing mechanism abused by spammers in crowdturfing microblog diffusion to markedly increase detection performance. Thirdly, through our optimal function based on semi-supervised methods, we overcome the problem of label sparseness, thus obtaining a more reliable capacity to deal with the challenges of big, sparsely labeled data. Extensive experiments on real datasets demonstrate that our method outperforms four baselines in various metrics (Precision-Recall, AUC values, Precision@K and so on). We also develop a robust system, the functions of which include data collection and availability analysis, spam and spammer detection, and visualization. To render our experiments replicable, we have made our dataset and codes openly available at https://github.com/sunxiangguo/Crowdturfing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available