Journal
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Volume 106, Issue -, Pages -Publisher
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2021.104474
Keywords
Crowdsourcing; Label integration; Worker quality; Differential evolution
Categories
Funding
- National Natural Science Foundation of China [U1711267]
- Fundamental Research Funds for the Central Universities [CUGGC03]
- Foundation of Key Laboratory of Artificial Intelligence, Ministry of Education, China [AI2020002]
Ask authors/readers for more resources
Crowdsourcing is an efficient way to obtain labeled data, but the quality of crowd workers' labeling directly affects the data quality. Existing label integration strategies are rough and sub-optimal, and can only handle binary classification problems. The proposed differential evolution-based weighted soft majority voting strategies significantly outperform existing state-of-the-art strategies in experiments.
Crowdsourcing has attracted considerable attention in recent years. A large amount of labeled data can be obtained efficiently and cheaply from the crowdsourcing platform. Obviously, the labeling quality of crowd workers directly influences the quality of the labeled data. Although a small amount of label integration strategies have recently noticed the differences in the quality of crowd workers labeling different instances, which just utilize the statistical characteristics of multiple noisy labels to estimate the quality of crowd workers and thus are rough and sub-optimal. In addition, they can only deal with binary classification problems, which restricts the practical applications of crowdsourcing. To simultaneously solve these two issues, we propose three differential evolution-based weighted soft majority voting strategies for multi-class classification. In our proposed strategies, we exploit a differential evolution (DE) algorithm to estimate the quality of crowd workers labeling different instances by minimizing the Error, Gini and Entropy of weighted multiple noisy labels. Extensive experimental results on simulated and real-world datasets show that our proposed strategies significantly outperform all the other existing state-of-the-art label integration strategies.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available