☆ 4.7 Article

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

IEEE TRANSACTIONS ON ROBOTICS (2020)

Journal

IEEE TRANSACTIONS ON ROBOTICS

Volume 36, Issue 3, Pages 582-596

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TRO.2019.2959445

Keywords

Task analysis; Haptic interfaces; Visualization; Robot sensing systems; Solid modeling; Reinforcement learning; Deep learning in robotics and automation; perception for grasping and manipulation; sensor fusion; sensor-based control

Funding

D.com American Technologies Corporation (JD) under the SAIL-JD AI Research Initiative
Toyota Research Institute (TRI)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Journal

IEEE TRANSACTIONS ON ROBOTICS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Journal

IEEE TRANSACTIONS ON ROBOTICS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper