4.6 Article

Recognizability bias in citizen science photographs

期刊

ROYAL SOCIETY OPEN SCIENCE
卷 10, 期 2, 页码 -

出版社

ROYAL SOC
DOI: 10.1098/rsos.221063

关键词

citizen science; image recognition; machine learning; recognizability

向作者/读者索取更多资源

Citizen science and automated collection methods rely on image recognition for observational data, but recognition models also require large amounts of data, creating a feedback loop. Harder-to-recognize species tend to be under-reported and less prevalent in training data, hampering training for challenging species. This study found a 'recognizability bias' across multiple taxa, where species easily identified by humans and models are more prevalent in available image data, regardless of picture quality or biological traits. This has implications for training future models with more data.
Citizen science and automated collection methods increasingly depend on image recognition to provide the amounts of observational data research and management needs. Recognition models, meanwhile, also require large amounts of data from these sources, creating a feedback loop between the methods and tools. Species that are harder to recognize, both for humans and machine learning algorithms, are likely to be under-reported, and thus be less prevalent in the training data. As a result, the feedback loop may hamper training mostly for species that already pose the greatest challenge. In this study, we trained recognition models for various taxa, and found evidence for a 'recognizability bias', where species that are more readily identified by humans and recognition models alike are more prevalent in the available image data. This pattern is present across multiple taxa, and does not appear to relate to differences in picture quality, biological traits or data collection metrics other than recognizability. This has implications for the expected performance of future models trained with more data, including such challenging species.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据