4.7 Article

Identifying malicious social media contents using multi-view Context-Aware active learning

出版社

ELSEVIER
DOI: 10.1016/j.future.2019.03.015

关键词

Graph search; Multi-view classification; Active learning; Malicious tweets classification; Semi-supervised learning

向作者/读者索取更多资源

This paper presents a semi-supervised, multi-view, active learning method, which uses an optimized set of most informative samples and utilizes domain specific context information to efficiently and effectively identify malicious forum content in web-based social media platforms. As research shows, the task of automated identification of malicious forum posts, which also helps in detecting their associated key suspects in web forums, faces numerous challenges: (1) Online data, particularly social media data originate from diverse and heterogeneous sources and are largely unstructured; (2) Online data characteristics evolve quickly; and, (3) There are limited amounts of ground truth data to support the development of effective classification technologies in a strictly supervised scenario. In order to address the above challenges, the proposed human-machine collaborative, semi-supervised learning method is designed to efficiently and effectively identify harmful, provocative, or fabricated forum content by observing only a small number of annotated samples. Our learning framework is initiated by modeling initial view-dependent classifiers from a limited labeled data collection and allows each, in an interactive manner, to evolve dynamically into a sophisticated model by observing data patterns from a shared shortlist of most informative samples, identified via a graph-based optimization method and solved by a maximum flow algorithm. By designing a context rich metric definition in a data-driven manner, the proposed framework is able to learn a sufficiently robust classification model, that utilizes only a small number of human annotated samples, typically 1-2 orders of magnitude fewer as compared to a fully supervised solution. We validate our method using a large collection of flagged words with a wide range of origins, words frequently appearing in web-based forums and manually verified by multiple experienced, independent domain experts. (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据