3.8 Proceedings Paper

PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-030-67670-4_5

关键词

Batch-mode active learning; Imbalance data; Hate-speech recognition

资金

  1. Indonesia Endowment Fund for Education (LPDP)
  2. Ministry of Research, Technology and Higher Education of the Republic of Indonesia (BUDI-LN Scholarship)
  3. Dutch Research Council [NWO628 003 001]

向作者/读者索取更多资源

As social media plays a fixed role in our daily life, the issue of hostile contents and hate-speech is exacerbated, necessitating automatic hate-speech detection. A novel partition-based batch mode active learning framework is proposed to address the challenges of high class-skew, demonstrating substantial improvements in detection performance through extensive experiments.
While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the classs-kew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to cherry pick minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据