3.8 Proceedings Paper

Natural Language Object Retrieval

出版社

IEEE
DOI: 10.1109/CVPR.2016.493

关键词

-

资金

  1. FITweltweit-Program of the German Academic Exchange Service (DAAD)
  2. NUS startup grant [R263000008133]
  3. DARPA
  4. AFRL
  5. DoD MURI [N000141110688]
  6. NSF [IIS-1427425, IIS-1212798]
  7. Berkeley Vision and Learning Center

向作者/读者索取更多资源

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object. Natural language object retrieval differs from text-based image retrieval task as it involves spatial information about objects within the scene and global scene context. To address this issue, we propose a novel Spatial Context Recurrent ConvNet (SCRC) model as scoring function on candidate boxes for object retrieval, integrating spatial configurations and global scene-level contextual information into the network. Our model processes query text, local image descriptors, spatial configurations and global context features through a recurrent network, outputs the probability of the query text conditioned on each candidate box as a score for the box, and can transfer visual-linguistic knowledge from image captioning domain to our task. Experimental results demonstrate that our method effectively utilizes both local and global information, outperforming previous baseline methods significantly on different datasets and scenarios, and can exploit large scale vision and language datasets for knowledge transfer.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据