☆ 4.5 Article

Co-Detection of crowdturfing microblogs and spammers in online social networks

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS (2020)

Journal

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS

Volume 23, Issue 1, Pages 573-607

Publisher

SPRINGER

DOI: 10.1007/s11280-019-00727-4

Keywords

Crowdsourcing; Spammer detection; Semi-supervised learning; Online social networks

Funding

National Key RAMP
D Program of China [2017YFB1003000]
National Natural Science Foundation of China [61972087, 61772133, 61472081, 61402104]
Jiangsu Provincial Key Project [BE2018706]
Key Laboratory of Computer Network Technology of Jiangsu Province
Jiangsu Provincial Key Laboratory of Network and Information Security [BM2003201]
Key Laboratory of Computer Network and Information Integration of Ministry of Education of China [93K-9]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The rise of online crowdsourcing services has prompted an evolution from traditional spamming accounts, which spread unwanted advertisements and fraudulent content, into novel spammers that resemble those of normal users. Prior research has mainly focused on machine accounts and spams separately, but characteristics of new types of spammers and spamming make it difficult for traditional methods to perform well. In this paper, we integrate the study of these new types of spammers with the study of crowdturfing microblogs, investigating the mechanism of crowdsourcing and the close relationship between crowdturfing spammers and microblogs in order to detect new types of spammers and spams more precisely. We propose a novel semi-supervised learning framework for co-detecting crowdturfing microblogs and spammers by comprehensively modeling user behavior, message content, and users' following and retweeting networks. In order to meet the challenge of sparsely labeled datasets, we design an elaborate co-detection target optimal function to minimize empirical error and to permit the dissemination of sparse labels to unlabeled samples. The advantage of this framework is threefold. First, through a deep-level mining of new-type spammers, we aggregate a number of new-found features that can help us make significant distinctions between normal users and new-type spammers. Secondly, by modeling both following networks and retweeting networks, we characterize the essence of the crowdsourcing mechanism abused by spammers in crowdturfing microblog diffusion to markedly increase detection performance. Thirdly, through our optimal function based on semi-supervised methods, we overcome the problem of label sparseness, thus obtaining a more reliable capacity to deal with the challenges of big, sparsely labeled data. Extensive experiments on real datasets demonstrate that our method outperforms four baselines in various metrics (Precision-Recall, AUC values, Precision@K and so on). We also develop a robust system, the functions of which include data collection and availability analysis, spam and spammer detection, and visualization. To render our experiments replicable, we have made our dataset and codes openly available at https://github.com/sunxiangguo/Crowdturfing.

Co-Detection of crowdturfing microblogs and spammers in online social networks

Journal

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Co-Detection of crowdturfing microblogs and spammers in online social networks

Journal

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper