☆ 3.8 Proceedings Paper

Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) (2020)

Journal

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20)

Volume -, Issue -, Pages 1761-1764

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3397271.3401239

Keywords

crowdsourcing; text sequence aggregation; reliability

Funding

JSPS KAKENHI [19K20277]
Grants-in-Aid for Scientific Research [19K20277] Funding Source: KAKEN

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The crowd is cheaper and easier to access than the oracle to collect the ground truth data for training and evaluating models. To ensure the quality of the crowdsourced data, people can assign multiple crowd workers to one question and then aggregate the multiple answers with diverse quality into a golden one. In the areas of IR and NLP, the ground truth data of many tasks are text sequences. To aggregate multiple crowdsourced text sequences with diverse quality, the methods adapted from the existing answer aggregation methods which are proposed for labels (e.g., categories) only focus on one-sided reliability and do not fully utilize the rich information in text sequences. We thus propose a crowdsourced text sequence aggregation method which can capture the hybrid reliability information, i.e., the local question-wise reliability of text answers and global dataset-wise reliability of crowd workers. For the local reliability, it also incorporates the text similarities from hybrid representation, i.e., the text embeddings and word sequences. The experiments based on real crowdsourced datasets show that our method outperforms the baselines which only utilize one-sided reliability and one-sided representation. Our method can effectively leverage the rich information of text sequences.

Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation

Journal

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation

Journal

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper