4.6 Article

Spatio-temporal matching for siamese visual tracking

期刊

NEUROCOMPUTING
卷 522, 期 -, 页码 73-88

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2022.11.093

关键词

Visual tracking; Siamese network; Spatio-temporal matching; Instance-aware; Temporal consistency

向作者/读者索取更多资源

In this paper, a spatio-temporal matching process is proposed to thoroughly explore the capability of 4-D matching in space and time. The SI-Corr method is introduced in spatial matching to calibrate channel-wise responses for each matching position and distinguish the target and distractors. The ARM module is designed in temporal matching to restrict the abrupt alteration of interframe response maps and learn a temporal consistency of context structure distribution. Experimental results demonstrate the state-of-the-art performance of the proposed method in six benchmark tests.
Siamese trackers formulate the visual tracking task as a similarity matching problem through cross cor-relation. It is arduous for such methods to track targets with the presence of distractors. We suspect the reasons are twofold: 1) The irrelevant activated channels in the correlation map will produce ambiguous matching results. 2) The pipeline is a per-frame matching process and cannot handle the response aber-rance caused by temporal context variation. In this paper, we propose a spatio-temporal matching pro-cess to thoroughly explore the capability of 4-D matching in space (height, width and channel) and time. In spatial matching, we introduce a space-variant instance-aware correlation (SI-Corr) to implement dif-ferent channel-wise response recalibration for each matching position. SI-Corr can guide the generation of instance-aware features and distinguish the target and distractors at the instance level. In temporal matching, we design an aberrance repressed module (ARM) to investigate the short-term positional rela-tionship between the target and distractors. ARM utilizes a simple optimization method to restrict the abrupt alteration of the interframe response maps, which allows the network to learn a temporal consis-tency of context structure distribution. Moreover, we efficiently embed temporal consistency into the inference process. Experiments on six benchmarks, including OTB100, VOT2018, VOT2020, GOT-10k, LaSOT and TrackingNet, demonstrate the state-of-the-art performance of the proposed method.(c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据