期刊
出版社
SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-030-67670-4_5
关键词
Batch-mode active learning; Imbalance data; Hate-speech recognition
类别
资金
- Indonesia Endowment Fund for Education (LPDP)
- Ministry of Research, Technology and Higher Education of the Republic of Indonesia (BUDI-LN Scholarship)
- Dutch Research Council [NWO628 003 001]
As social media plays a fixed role in our daily life, the issue of hostile contents and hate-speech is exacerbated, necessitating automatic hate-speech detection. A novel partition-based batch mode active learning framework is proposed to address the challenges of high class-skew, demonstrating substantial improvements in detection performance through extensive experiments.
While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the classs-kew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to cherry pick minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据