Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

Article Computer Science, Artificial Intelligence

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Silvan Heller et al.

Summary: The Video Browser Showdown is an annual interactive evaluation campaign that tackles difficult video search challenges by attracting research teams to participate in testing and discussing interactive video retrieval systems. The 2021 showdown was the first to be held remotely, with a record number of sixteen scoring systems competing.

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL (2022)

添加到收藏夹

Article Engineering, Electrical & Electronic

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval

Yunbo Wang et al.

Summary: This paper proposes a cross-media retrieval method called MARS, which allows each modality to be trained independently, improving the flexibility and practicality of CMR. MARS introduces a label parsing module and a modality-specific representation module to generate modality-agnostic semantic representation and trains them using the same objective for better semantic alignment. Experimental results demonstrate that MARS outperforms existing methods.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

添加到收藏夹

Proceedings Paper Acoustics

FUSION AND ORTHOGONAL PROJECTION FOR IMPROVED FACE-VOICE ASSOCIATION

Muhammad Saad Saeed et al.

Summary: The study proposes an effective mechanism for improving face-voice association by enriching feature representation and utilizing orthogonal constraints for clustering. The framework performs well on a large dataset and is more effective and efficient compared to contemporary methods.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

添加到收藏夹

Article Computer Science, Information Systems

Disentangled Representation Learning for Cross-Modal Biometric Matching

Hailong Ning et al.

Summary: Cross-modal biometric matching aims to determine the corresponding face from a voice or identify the corresponding voice from a face. This study proposes a disentangled representation learning method for CMBM to disentangle alignable identity factors and modality-dependent factors. The method consists of feature extraction and disentangled representation learning steps. Experimental results show that the proposed method outperforms state-of-the-art methods on the VoxCeleb dataset.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Peisong Wen et al.

Summary: This paper introduces a novel framework that addresses the issues of using local information and dealing with different learning difficulties across subjects in learning the association between voice and face automatically. Through a two-level modality alignment loss and a dynamic reweighting scheme, the proposed method outperforms previous methods in voice-face matching, verification, and retrieval tasks in experiments.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

添加到收藏夹

Proceedings Paper Acoustics

LEARNING AUDIO-VISUAL CORRELATIONS FROM VARIATIONAL CROSS-MODAL GENERATION

Ye Zhu et al.

Summary: A novel method is proposed in this study to explore audio-visual correlations through self-supervised learning, utilizing a VAE framework with multiple encoders, a shared decoder, and an additional Wasserstein distance constraint. Experimental results demonstrate that the proposed approach can effectively learn audio-visual relationships and achieve competitive performance in multiple audio-visual downstream tasks.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

添加到收藏夹

Article Mathematical & Computational Biology