☆ 4.7 Article

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume 30, Issue -, Pages 3252-3262

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2021.3058614

Keywords

Grounding; Annotations; Two dimensional displays; Training; Feature extraction; Computational modeling; Task analysis; Weakly supervised; temporal sentence grounding

Funding

National Key Research and Development Program [2018YFB0804204]
Strategic Priority Research Program of Chinese Academy of Sciences [XDC02050500]
National Natural Science Foundation of China [62022078, 62021001]
Youth Innovation Promotion Association CAS [2018166]
Open Project Program of the National Laboratory of Pattern Recognition (NLPR) [202000019]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

LCNet utilizes hierarchical representation of video and text features and introduces a self-supervised cycle-consistent loss to effectively learn the matching relationships between video and text, achieving superior performance compared to existing weakly supervised methods.

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper