3.8 Proceedings Paper

Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3123266.3123369

关键词

Cross-media retrieval; rich semantic embeddings; multi-sensory fusion; TextNet

资金

  1. Shenzhen Peacock Plan [20130408183003656]
  2. Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality [ZDSYS201703031405467]
  3. Guangdong Science and Technology Project [2014B010117007]

向作者/读者索取更多资源

Cross-media retrieval aims at seeking the semantic association between different media types. Most existing methods paid much attention on learning mapping functions or finding the optimal spaces, but neglected how people accurately cognize images and texts. This paper proposes a brain inspired cross-media retrieval framework to learn rich semantic embeddings of multimedia. Different from directly using off-the-shelf image features, we combine the visual and descriptive senses for an image from the view of human perception via a joint model, called multi-sensory fusion network (MSFN). A topic model based TextNet maps texts into the same semantic space as images according to their shared ground truth labels. Moreover, in order to overcome the limitations of insufficient data for training neural networks and less complexity in text form, we introduce a large-scale image-text dataset, called Britannica dataset. Extensive experiments show the effectiveness of our framework for different lengths of texts on three benchmark datasets as well as Britannica dataset. Most of all, we report the best known average results of Img2Text and Text2Img compared with several state-of-the-art methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据