4.5 Article

Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks

期刊

PATTERN RECOGNITION LETTERS
卷 175, 期 -, 页码 52-58

出版社

ELSEVIER
DOI: 10.1016/j.patrec.2023.10.004

关键词

Multimodal named entity recognition; Multi-task learning; Cross-modal learning

向作者/读者索取更多资源

This paper introduces a method for improving the performance of Multimodal Named Entity Recognition (MNER) through cross-modal auxiliary tasks. The method utilizes cross-modal matching and cross-modal mutual information maximization to address the issue of mismatched image-text pairs, and separates the features of the main task and auxiliary tasks through a cross-modal gate-control mechanism.
Although the existing Multimodal Named Entity Recognition (MNER) methods have achieved promising performance, they suffer from the following drawbacks in social media scenarios. Firstly, most existing methods are based on a strong assumption that the textual content and the associated images are matched, which is not always valid in real scenarios; Secondly, current methods fail to filter out modality-specific random noise, which impedes models from exploiting modality-shared features. In this paper, a novel multi-task multimodal learning architecture is put forward, which aims to improve Multimodal Named Entity Recognition (MNER) performance by cross-modal auxiliary tasks (CMAT). Specifically, we first separate the shared and task-specific features for the main task and auxiliary tasks respectively, which is accomplished by cross-modal gate-control mechanism. Subsequently, without extra pre-processing or annotations, we utilize the cross-modal matching to address the issue of mismatched image-text pairs, and the cross-modal mutual information maximization to optimize the most relevant cross-modal features. Moreover, experimental results on the two widely used datasets confirm the superiority of our proposed approach.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据