4.7 Article

Deep Object Co-Segmentation and Co-Saliency Detection via High-Order Spatial-Semantic Network Modulation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 25, Issue -, Pages 5733-5746

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2022.3198848

Keywords

Co-saliency detection; co-segmentation; graph aggregation; network modulation

Ask authors/readers for more resources

This paper proposes an adaptive spatially and high-order semantically modulated deep network framework for object co-segmentation and co-saliency detection. The framework extracts multi-resolution image features and employs adaptive spatial and high-order semantic modulators to highlight co-object regions and learn rich semantic features respectively. Experimental results demonstrate the superior accuracy of the proposed method on multiple benchmark datasets.
Object co-segmentation (CSG) is to segment the common objects of the same category in multiple relevant images while the co-saliency detection (CSD) aims to discover the salient and common foreground objects in a group of images. To process both tasks simultaneously, this paper presents an adaptive spatially and high-order semantically modulated deep network framework. A backbone network is first adopted to extract multi-resolution image features. With the multi-resolution features of the relevant images as input, we design an adaptive spatial modulator to learn a spatial representation that can highlight the co-object regions for each image. The adaptive spatial modulator fully captures the rich correlations of all image feature descriptors via unsupervised clustering and a graph aggregation strategy. The learned representation can well localize the common foreground object while effectively suppressing the background signals. For the high-order semantic modulator, we model it as a supervised image classification task. We propose a hierarchical high-order pooling module to learn the rich semantic features for classification use. The outputs of the two modulators manipulate the multi-resolution features by a shift-and-scale operation so that the features focus on segmenting common object regions. The proposed model is trained end-to-end without any intricate post-processing. Extensive experiments on three CSG benchmark datasets (MSRC, i-Coseg, and PASCAL-VOC) and three CSD datasets (Cosal2015, CoCA, and CoSOD3k) demonstrate the superior accuracy of the proposed method compared to state-of-the-art methods on both tasks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available