4.8 Article

Binaural SoundNet: Predicting Semantics, Depth and Motion With Binaural Sounds

Related references

Note: Only part of the references are listed.
Article Computer Science, Artificial Intelligence

Multi-Task Learning for Dense Prediction Tasks: A Survey

Simon Vandenhende et al.

Summary: With the advent of deep learning, dense prediction tasks have significantly improved. Recent multi-task learning techniques have shown promising results by jointly tackling multiple tasks. This survey provides a comprehensive view on state-of-the-art deep learning approaches for multi-task learning in computer vision, with a focus on dense prediction tasks.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Robotics

Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning

Jannik Zurn et al.

Summary: This article proposes a novel terrain classification framework that utilizes an unsupervised proprioceptive classifier to learn from vehicle-terrain interaction sounds, enabling self-supervision of an exteroceptive classifier for pixelwise semantic segmentation of images.

IEEE TRANSACTIONS ON ROBOTICS (2021)

Article Robotics

Hearing What You Cannot See: Acoustic Vehicle Detection Around Corners

Yannick Schulz et al.

Summary: This study proposes the use of passive acoustic perception as an additional sensing modality for intelligent vehicles, showing that vehicles behind blind corners can be detected through sound reflections before entering into line-of-sight. A novel method is presented to classify vehicle approach direction before visibility, achieving high accuracy in hidden vehicle classification tasks. The research considers different environmental patterns and achieves accurate classification of hidden vehicles.

IEEE ROBOTICS AND AUTOMATION LETTERS (2021)

Proceedings Paper Computer Science, Artificial Intelligence

ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding

Christos Sakaridis et al.

Summary: The ACDC dataset consists of 4006 images equally distributed among four common adverse conditions, namely fog, nighttime, rain, and snow, for training and testing semantic segmentation methods. Each image is accompanied by high-quality fine pixel-level semantic annotation, a corresponding image of the same scene under normal conditions, and a binary mask to distinguish clear and uncertain semantic content within the image. This dataset supports both standard semantic segmentation and uncertainty-aware semantic segmentation, posing challenges to state-of-the-art supervised and unsupervised approaches while guiding future progress in the field.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Beyond Image to Depth: Improving Depth Prediction using Echoes

Kranti Kumar Parida et al.

Summary: In this study, we introduce a novel method for depth estimation by integrating RGB images, binaural echoes, and estimated material properties, achieving significant improvement in accuracy compared to existing audio-visual depth prediction methods.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Yanbei Chen et al.

Summary: This study discusses the transfer of knowledge across heterogeneous modalities by composing representations from different modalities to uncover richer multi-modal knowledge. By learning a compositional embedding and utilizing compositional contrastive learning, it facilitates the convergence of representations across modalities.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Francisco Rivera Valverde et al.

Summary: Sound attributes of objects help in object detection and tracking. This study proposes a self-supervised framework that leverages multiple modalities to distill knowledge into an audio student network. Experimental results show that the approach outperforms existing methods in detecting multiple objects using only sound during inference.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Visually Informed Binaural Audio Generation without Binaural Audios

Xudong Xu et al.

Summary: In this study, a new effective pipeline called PseudoBinaural is proposed, which does not require binaural recordings but carefully builds pseudo visual-stereo pairs with mono data for training. Compared to fully-supervised paradigms, this method shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Review Psychology, Multidisciplinary

Spatial Soundscapes and Virtual Worlds: Challenges and Opportunities

Chinmay Rajguru et al.

FRONTIERS IN PSYCHOLOGY (2020)

Article Computer Science, Artificial Intelligence

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2018)

Article Computer Science, Software Engineering

Scene-Aware Audio for 360° Videos

Dingzeyu Li et al.

ACM TRANSACTIONS ON GRAPHICS (2018)

Article Biochemical Research Methods

A fully autonomous terrestrial bat-like acoustic robot

Itamar Eliakim et al.

PLOS COMPUTATIONAL BIOLOGY (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Learning to Localize Sound Source in Visual Scenes

Arda Senocak et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Review Automation & Control Systems

Localization of sound sources in robotics: A review

Caleb Rascon et al.

ROBOTICS AND AUTONOMOUS SYSTEMS (2017)

Proceedings Paper Computer Science, Artificial Intelligence

3D Room Geometry Reconstruction Using Audio-Visual Sensors

Hansung Kim et al.

PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Eddy Ilg et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Engineering, Electrical & Electronic

Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks

Arsalan Mousavian et al.

PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV) (2016)

Article Computer Science, Artificial Intelligence

A survey on sound source localization in robotics: From binaural to array processing methods

S. Argentieri et al.

COMPUTER SPEECH AND LANGUAGE (2015)

Article Neurosciences

Sound localization h head movement: implications for 3-d audio displays

Ken I. McAnally et al.

FRONTIERS IN NEUROSCIENCE (2014)

Article Robotics

Vision meets robotics: The KITTI dataset

A. Geiger et al.

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH (2013)

Article Multidisciplinary Sciences

Acoustic echoes reveal room shape

Ivan Dokmanic et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2013)

Article Acoustics

Inference of Room Geometry From Acoustic Impulse Responses

Fabio Antonacci et al.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2012)

Article Computer Science, Artificial Intelligence

LabelMe: A database and web-based tool for image annotation

Bryan C. Russell et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2008)

Article Engineering, Electrical & Electronic

Kalman filters for time delay of arrival-based source localization

Ulrich Klee et al.

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING (2006)