4.7 Article

Identifying malicious social media contents using multi-view Context-Aware active learning

Publisher

ELSEVIER
DOI: 10.1016/j.future.2019.03.015

Keywords

Graph search; Multi-view classification; Active learning; Malicious tweets classification; Semi-supervised learning

Ask authors/readers for more resources

This paper presents a semi-supervised, multi-view, active learning method, which uses an optimized set of most informative samples and utilizes domain specific context information to efficiently and effectively identify malicious forum content in web-based social media platforms. As research shows, the task of automated identification of malicious forum posts, which also helps in detecting their associated key suspects in web forums, faces numerous challenges: (1) Online data, particularly social media data originate from diverse and heterogeneous sources and are largely unstructured; (2) Online data characteristics evolve quickly; and, (3) There are limited amounts of ground truth data to support the development of effective classification technologies in a strictly supervised scenario. In order to address the above challenges, the proposed human-machine collaborative, semi-supervised learning method is designed to efficiently and effectively identify harmful, provocative, or fabricated forum content by observing only a small number of annotated samples. Our learning framework is initiated by modeling initial view-dependent classifiers from a limited labeled data collection and allows each, in an interactive manner, to evolve dynamically into a sophisticated model by observing data patterns from a shared shortlist of most informative samples, identified via a graph-based optimization method and solved by a maximum flow algorithm. By designing a context rich metric definition in a data-driven manner, the proposed framework is able to learn a sufficiently robust classification model, that utilizes only a small number of human annotated samples, typically 1-2 orders of magnitude fewer as compared to a fully supervised solution. We validate our method using a large collection of flagged words with a wide range of origins, words frequently appearing in web-based forums and manually verified by multiple experienced, independent domain experts. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available