☆ 3.8 Proceedings Paper

PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V (2021)

期刊

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V

卷 12461, 期 -, 页码 68-84

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

DOI: 10.1007/978-3-030-67670-4_5

关键词

Batch-mode active learning; Imbalance data; Hate-speech recognition

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Interdisciplinary Applications

资金

Indonesia Endowment Fund for Education (LPDP)
Ministry of Research, Technology and Higher Education of the Republic of Indonesia (BUDI-LN Scholarship)
Dutch Research Council [NWO628 003 001]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

As social media plays a fixed role in our daily life, the issue of hostile contents and hate-speech is exacerbated, necessitating automatic hate-speech detection. A novel partition-based batch mode active learning framework is proposed to address the challenges of high class-skew, demonstrating substantial improvements in detection performance through extensive experiments.

While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the classs-kew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to cherry pick minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.

PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

期刊

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

期刊

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文