4.6 Article

Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Silvan Heller et al.

Summary: The Video Browser Showdown is an annual interactive evaluation campaign that tackles difficult video search challenges by attracting research teams to participate in testing and discussing interactive video retrieval systems. The 2021 showdown was the first to be held remotely, with a record number of sixteen scoring systems competing.

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL (2022)

Article Engineering, Electrical & Electronic

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval

Yunbo Wang et al.

Summary: This paper proposes a cross-media retrieval method called MARS, which allows each modality to be trained independently, improving the flexibility and practicality of CMR. MARS introduces a label parsing module and a modality-specific representation module to generate modality-agnostic semantic representation and trains them using the same objective for better semantic alignment. Experimental results demonstrate that MARS outperforms existing methods.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Proceedings Paper Acoustics

FUSION AND ORTHOGONAL PROJECTION FOR IMPROVED FACE-VOICE ASSOCIATION

Muhammad Saad Saeed et al.

Summary: The study proposes an effective mechanism for improving face-voice association by enriching feature representation and utilizing orthogonal constraints for clustering. The framework performs well on a large dataset and is more effective and efficient compared to contemporary methods.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Article Computer Science, Information Systems

Disentangled Representation Learning for Cross-Modal Biometric Matching

Hailong Ning et al.

Summary: Cross-modal biometric matching aims to determine the corresponding face from a voice or identify the corresponding voice from a face. This study proposes a disentangled representation learning method for CMBM to disentangle alignable identity factors and modality-dependent factors. The method consists of feature extraction and disentangled representation learning steps. Experimental results show that the proposed method outperforms state-of-the-art methods on the VoxCeleb dataset.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Peisong Wen et al.

Summary: This paper introduces a novel framework that addresses the issues of using local information and dealing with different learning difficulties across subjects in learning the association between voice and face automatically. Through a two-level modality alignment loss and a dynamic reweighting scheme, the proposed method outperforms previous methods in voice-face matching, verification, and retrieval tasks in experiments.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Acoustics

LEARNING AUDIO-VISUAL CORRELATIONS FROM VARIATIONAL CROSS-MODAL GENERATION

Ye Zhu et al.

Summary: A novel method is proposed in this study to explore audio-visual correlations through self-supervised learning, utilizing a VAE framework with multiple encoders, a shared decoder, and an additional Wasserstein distance constraint. Experimental results demonstrate that the proposed approach can effectively learn audio-visual relationships and achieve competitive performance in multiple audio-visual downstream tasks.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

Article Mathematical & Computational Biology

An Overview of Image Caption Generation Methods

Haoran Wang et al.

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE (2020)

Article Computer Science, Artificial Intelligence

Modality-specific and shared generative adversarial network for cross-modal retrieval

Fei Wu et al.

PATTERN RECOGNITION (2020)

Article Computer Science, Artificial Intelligence

A study on deep learning spatiotemporal models and feature extraction techniques for video understanding

M. Suresha et al.

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL (2020)

Proceedings Paper Computer Science, Artificial Intelligence

On Learning Associations of Faces and Voices

Changil Kim et al.

COMPUTER VISION - ACCV 2018, PT V (2019)

Review Psychology, Multidisciplinary

The Modality-Specific Learning Style Hypothesis: A Mini-Review

Karoline Aslaksen et al.

FRONTIERS IN PSYCHOLOGY (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Seeing Voices and Hearing Faces: Cross-modal biometric matching

Arsha Nagrani et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Look, Listen and Learn

Relja Arandjelovic et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Article Psychology

Matching novel face and voice identity using static and dynamic facial images

Harriet M. J. Smith et al.

ATTENTION PERCEPTION & PSYCHOPHYSICS (2016)