4.8 Article

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2020.3009758

Keywords

Image search; object detection; re-identification; retrieval; deep learning

Funding

  1. National Science Fund of China [61771079]
  2. Chongqing Youth Talent Program

Ask authors/readers for more resources

The traditional object retrieval task requires precise cropping of the objects and learning discriminative feature representations. However, in real-world scenarios, accurate detection or annotation of objects is often not available, leading to the need for image-level search with joint detection and retrieval. This paper proposes an Integrated Net and an improved version, DC-I-Net, for tackling this challenge.
The traditional object (person) retrieval (re-identification) task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e., image-level search with multi-task integration of joint detection and retrieval. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. Benefited by the Siamese structure, I-Net learns the shared feature representation, because, on which, both object detection and classification tasks are handled. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. The shared feature representation of I-Net may restrict the task-specific flexibility and learning capability between detection and retrieval tasks. Therefore, with the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center guided HEP loss (C2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets, such as CUHK-SYSU dataset andPRWdataset for person search and the large-scaleWebTattoo dataset for tattoo search, demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available