3.8 Proceedings Paper

Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3397271.3401239

Keywords

crowdsourcing; text sequence aggregation; reliability

Funding

  1. JSPS KAKENHI [19K20277]
  2. Grants-in-Aid for Scientific Research [19K20277] Funding Source: KAKEN

Ask authors/readers for more resources

The crowd is cheaper and easier to access than the oracle to collect the ground truth data for training and evaluating models. To ensure the quality of the crowdsourced data, people can assign multiple crowd workers to one question and then aggregate the multiple answers with diverse quality into a golden one. In the areas of IR and NLP, the ground truth data of many tasks are text sequences. To aggregate multiple crowdsourced text sequences with diverse quality, the methods adapted from the existing answer aggregation methods which are proposed for labels (e.g., categories) only focus on one-sided reliability and do not fully utilize the rich information in text sequences. We thus propose a crowdsourced text sequence aggregation method which can capture the hybrid reliability information, i.e., the local question-wise reliability of text answers and global dataset-wise reliability of crowd workers. For the local reliability, it also incorporates the text similarities from hybrid representation, i.e., the text embeddings and word sequences. The experiments based on real crowdsourced datasets show that our method outperforms the baselines which only utilize one-sided reliability and one-sided representation. Our method can effectively leverage the rich information of text sequences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available