☆ 4.7 Article

Identifying malicious social media contents using multi-view Context-Aware active learning

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2019)

期刊

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

卷 100, 期 -, 页码 365-379

出版社

ELSEVIER

DOI: 10.1016/j.future.2019.03.015

关键词

Graph search; Multi-view classification; Active learning; Malicious tweets classification; Semi-supervised learning

类别

Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper presents a semi-supervised, multi-view, active learning method, which uses an optimized set of most informative samples and utilizes domain specific context information to efficiently and effectively identify malicious forum content in web-based social media platforms. As research shows, the task of automated identification of malicious forum posts, which also helps in detecting their associated key suspects in web forums, faces numerous challenges: (1) Online data, particularly social media data originate from diverse and heterogeneous sources and are largely unstructured; (2) Online data characteristics evolve quickly; and, (3) There are limited amounts of ground truth data to support the development of effective classification technologies in a strictly supervised scenario. In order to address the above challenges, the proposed human-machine collaborative, semi-supervised learning method is designed to efficiently and effectively identify harmful, provocative, or fabricated forum content by observing only a small number of annotated samples. Our learning framework is initiated by modeling initial view-dependent classifiers from a limited labeled data collection and allows each, in an interactive manner, to evolve dynamically into a sophisticated model by observing data patterns from a shared shortlist of most informative samples, identified via a graph-based optimization method and solved by a maximum flow algorithm. By designing a context rich metric definition in a data-driven manner, the proposed framework is able to learn a sufficiently robust classification model, that utilizes only a small number of human annotated samples, typically 1-2 orders of magnitude fewer as compared to a fully supervised solution. We validate our method using a large collection of flagged words with a wide range of origins, words frequently appearing in web-based forums and manually verified by multiple experienced, independent domain experts. (C) 2019 Elsevier B.V. All rights reserved.

Identifying malicious social media contents using multi-view Context-Aware active learning

期刊

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Identifying malicious social media contents using multi-view Context-Aware active learning

期刊

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文