4.1 Article

DescribeML: A dataset description tool for machine learning



Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata

Amy K. Heger et al.

Proceedings of the ACM on Human-Computer Interaction (2022)

Review Computer Science, Hardware & Architecture

Documentation to facilitate communication between dataset creators and consumers

Timnit Gebru et al.

Summary: Data plays a critical role in machine learning, with mismatched datasets potentially leading to negative model behaviors and societal biases amplification. The World Economic Forum suggests documenting the origin, creation, and use of machine learning datasets to prevent discriminatory outcomes.


Review Computer Science, Artificial Intelligence

Data and its (dis)contents: A survey of dataset development and use in machine learning research

Amandalynne Paullada et al.

Summary: The work surveys literature that highlights limitations in dataset collection and usage practices in the field of machine learning, focusing on negative societal impacts and system performance. It covers approaches to mitigate bias in datasets and advocates for a combination of qualitative and quantitative methods for careful documentation and analysis during dataset creation and usage phases.