4.7 Article

Pseudo loss active learning for deep visual tracking

Journal

PATTERN RECOGNITION
Volume 130, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108773

Keywords

Active learning; Visual tracking; Pseudo loss; Pseudo label

Funding

  1. National Key R&D Program of China [2018AAA0101501]
  2. Science and Technology Project of SGCC (State Grid Corporation of China)

Ask authors/readers for more resources

This paper proposes a pseudo loss active learning (PLAL) method for visual tracking tasks, aiming to reduce the labor and time required for manual labeling by selecting informative and non-redundant training data. The PLAL method generates pseudo labels based on a tracking model and computes pseudo loss to measure the uncertainty of the target spatial context. Experimental results demonstrate the effectiveness of PLAL compared to baseline and other active learning approaches.
In visual tracking tasks, the training data are commonly composed of a large number of video sequences and each frame in the sequences needs to be labeled manually, which is labor-intensive and time-consuming. In addition, considering the similarity among the consecutive frames in the same sequence, there is significant redundancy in the training data. To address these problems, a novel pseudo loss active learning (PLAL) method is developed in this paper. PLAL aims to select the most informative and least redundant data for training to reduce the cost of labeling and maintain competitive tracking results simultaneously. Firstly, the Gaussian distribution based pseudo label is generated for the unlabeled candidates based on the tracking model which is initially trained on a small amount of training data. Then, the pseudo loss based on cross entropy is designed to compute the difference between the pseudo label and the target response map. The pseudo loss measures the uncertainty of the target spatial context which is used as the informativeness criterion of the image frame for selection. Meanwhile, a sampling interval threshold and a temporal penalty are employed for frame selection to avoid drastic variation in target appearance and reduce the redundancy within the consecutive candidate frames. Only the selected frames are labeled by the oracle (human expert) and then added to the training data. Extensive experiments on public benchmarks (OTB2013, OTB2015, VOT2018, UAV123, GOT-10K, TrackingNet, LaSOT, OxUvA and TLP) demonstrate that PLAL method outperforms the baseline and other recent active learning approaches. With only 3% of labeled data from the training dataset, PLAL reaches competitive performance (98-100%) compared to the model trained on the entire training dataset. (C) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available