☆ 4.7 Article

Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval

KNOWLEDGE-BASED SYSTEMS (2022)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 244, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2022.108545

关键词

Cross-lingual information retrieval; Language model ; Optimal transport; Result interpretability; Natural language processing

类别

Computer Science, Artificial Intelligence

资金

Slovenian Research Agency
European Union [H2020-ICT-952026]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel learning-to-rank model named LM-EMD that utilizes a multilingual BERT language model and Earth Mover's Distance (EMD) to measure the relevancy between a document and an input query. The model provides interpretable insights by analyzing the distances and identifying the contributing document tokens to the relevancy.

Modern cross-lingual document retrieval models are capable of finding documents relevant to the query. However, they do not have the capabilities for explaining why the document is relevant. This paper proposes a novel learning-to-rank model named LM-EMD that uses the multilingual BERT language model and Earth Mover's Distance (EMD) to measure the document's relevancy to the input query and provide interpretable insights into why a document is relevant. The model uses the query and document token's contextual embeddings generated with multilingual BERT to measure their distances in the embedding space, which are then used by EMD to calculate the document's relevance score and identify which document tokens contribute the most to its relevancy. We evaluate the model on five language pairs of varying degrees of similarity and analyze its performance. We find that the model (1) performs similar as the best performing comparing model on high-resource languages, (2) is less effective on low-resource languages, and (3) provides insight into why a document is relevant to the query. (C) 2022 The Author(s). Published by Elsevier B.V.

Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文