☆ 4.7 Article

Missing the missing values: The ugly duckling of fairness in machine learning

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2021)

期刊

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

卷 36, 期 7, 页码 3217-3258

出版社

WILEY

DOI: 10.1002/int.22415

关键词

algorithmic bias; confirmation bias; data imputation; fairness; missing values; sample bias; survey bias

类别

Computer Science, Artificial Intelligence

资金

Ministerio de Economia, Industria y Competitividad, Gobierno de Espana (ES) [RTI2018-094403-B-C3]
Generalitat Valenciana [PROMETEO/2019/09]
Future of Life Institute [RFP2-15]
European Commission

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper analyzes the relationship between missing values and algorithmic fairness in machine learning, indicating that rows containing missing values are usually fairer than the rest. The handling of missing values affects the trade-off between algorithm fairness and performance.

Nowadays, there is an increasing concern in machine learning about the causes underlying unfair decision making, that is, algorithmic decisions discriminating some groups over others, especially with groups that are defined over protected attributes, such as gender, race and nationality. Missing values are one frequent manifestation of all these latent causes: protected groups are more reluctant to give information that could be used against them, sensitive information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we present the first comprehensive analysis of the relation between missing values and algorithmic fairness for machine learning: (1) we analyse the sources of missing data and bias, mapping the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should discourage the consideration of missing values as the uncomfortable ugly data that different techniques and libraries for handling algorithmic bias get rid of at the first occasion, (3) we study the trade-off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods), and (4) we show that the sensitivity of six different machine-learning techniques to missing values is usually low, which reinforces the view that the rows with missing data contribute more to fairness through the other, nonmissing, attributes. We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making.

Missing the missing values: The ugly duckling of fairness in machine learning

期刊

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Missing the missing values: The ugly duckling of fairness in machine learning

期刊

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文