4.6 Article

Hierarchical self-adaptation network for multimodal named entity recognition in social media

Journal

NEUROCOMPUTING
Volume 439, Issue -, Pages 12-21

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.01.060

Keywords

Multimodal; Named entity recognition; Hierarchical self-adaptation network

Ask authors/readers for more resources

Multimodal Named Entity Recognition aims to identify named entities in user-generated posts with both images and texts. Previous methods benefit from visual features when text and image are aligned, but may fail when the image is missing or mismatched. To address these issues, a novel model HSN is proposed, which achieved state-of-the-art results on Real-world multimodal NER dataset and Twitter multimodal NER dataset.
Multimodal Named Entity Recognition task aims to identify named entities in user-generated posts containing both images and texts. Previous multimodal named entity recognition methods greatly benefit from visual features when the text and the image are well aligned, but this is not always the case in social media. On condition that the image is missing or mismatched with the text, these models usually fail to provide excellent performance. Besides, previous models use only single attention to capture the semantic interaction between different modalities, which largely ignore the existence of multiple entity objects in images and texts of the posts. To alleviate these issues, we present a novel model named Hierarchical Self-adaptation Network (HSN) to address these issues. The HSN contains 1) a Cross-modal Interaction Module to promote semantic interaction for the multiple entity objects in different modalities, which is proved to suppress wrong or incomplete attention in multimodal interactivity; 2) a Self-adaptive Multimodal Integration module to handle the problems that the images are missing or mismatched with the texts. Additionally, to evaluate the adaptability of HSN in real-life social media, we construct a Real world NER dataset consisting of plain text posts and multimodal posts from Twitter. Extensive experiments demonstrate that our model achieves state-of-the-art results on the Real-world multimodal NER dataset and the Twitter multimodal NER dataset. (c) 2021 Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available