4.6 Review

Documentation to facilitate communication between dataset creators and consumers

期刊

COMMUNICATIONS OF THE ACM
卷 64, 期 12, 页码 86-92

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3458723

关键词

-

向作者/读者索取更多资源

Data plays a critical role in machine learning, with mismatched datasets potentially leading to negative model behaviors and societal biases amplification. The World Economic Forum suggests documenting the origin, creation, and use of machine learning datasets to prevent discriminatory outcomes.
DATA PLAYS A critical role in machine learning. Every machine learning model is trained and evaluated using data, quite often in the form of static datasets. The characteristics of these datasets fundamentally influence a model's behavior: a model is unlikely to perform well in the wild if its deployment context does not match its training or evaluation datasets, or if these datasets reflect unwanted societal biases. Mismatches like this can have especially severe consequences when machine learning models are used in high-stakes domains, such as criminal justice,(1,13,24) hiring,(19) critical infrastructure,(11,21) and finance.(18) Even in other domains, mismatches may lead to loss of revenue or public relations setbacks. Of particular concern are recent examples showing that machine learning models can reproduce or amplify unwanted societal biases reflected in training datasets.(4,5,12) For these and other reasons, the World Economic Forum suggests all entities should document the provenance, creation, and use of machine learning datasets to avoid discriminatory outcomes.(25) Although data provenance has been studied

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据