☆ 4.1 Article

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

DATA SCIENCE AND ENGINEERING (2023)

期刊

DATA SCIENCE AND ENGINEERING

卷 -, 期 -, 页码 -

出版社

SPRINGERNATURE

DOI: 10.1007/s41019-023-00211-0

关键词

Gender bias; Word embeddings; Sentence encoders; Ethics of NLP; Data augmentation

类别

Computer Science, Information Systems Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The increasing use of semantic text analysis systems has made natural language understanding a crucial task. However, these systems often display social bias and lack transparency, particularly gender bias, which reinforces social stereotypes. This study proposes a new metric, called bias score, to measure gender bias in sentence embeddings. Experimental results show that the metric can identify gender-stereotyped sentences and aid in reducing bias in text corpora to improve fairness and accuracy in natural language understanding tasks. Additionally, the study compares the proposed approach with traditional methods for reducing bias in embedding-based language models.

The ever-increasing number of systems based on semantic text analysis is making natural language understanding a fundamental task: embedding-based language models are used for a variety of applications, such as resume parsing or improving web search results. At the same time, despite their popularity and widespread use, concern is rapidly growing due to their display of social bias and lack of transparency. In particular, they exhibit a large amount of gender bias, favouring the consolidation of social stereotypes. Recently, sentence embeddings have been introduced as a novel and powerful technique to represent entire sentences as vectors. We propose a new metric to estimate gender bias in sentence embeddings, named bias score. Our solution leverages semantic importance of words and previous research on bias in word embeddings, and it is able to discern between neutral and biased gender information at sentence level. Experiments on a real-world dataset demonstrate that our novel metric can identify gender stereotyped sentences. Furthermore, we employ bias score to detect and then remove or compensate for the more stereotyped entries in text corpora used to train sentence encoders, improving their degree of fairness. Finally, we prove that models retrained on fairer corpora are less prone to make stereotypical associations compared to their original counterpart, while preserving accuracy in natural language understanding tasks. Additionally, we compare our experiments with traditional methods for reducing bias in embedding-based language models.

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

期刊

DATA SCIENCE AND ENGINEERING

出版社

SPRINGERNATURE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving Gender-Related Fairness in Sentence Encoders: A Semantics-Based Approach

期刊

DATA SCIENCE AND ENGINEERING

出版社

SPRINGERNATURE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文