4.6 Article

When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection

期刊

IEEE ROBOTICS AND AUTOMATION LETTERS
卷 7, 期 3, 页码 8170-8177

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2022.3187261

关键词

Grasp detection; robotic grasping; vision transformer

类别

资金

  1. National Natural Science Foundation of China [U2013601, 62173314]

向作者/读者索取更多资源

TF-Grasp is a transformer-based architecture for robotic grasp detection. It combines local and cross window attention mechanisms to capture both local and global information, and utilizes multi-scale feature fusion for improved accuracy. Experimental results demonstrate that TF-Grasp achieves competitive performance on different datasets and shows good grasp capability in real-world scenarios.
In this letter, we present a transformer-based architecture, namely TF-Grasp, for robotic grasp detection. The developed TF-Grasp framework has two elaborate designs making it well suitable for visual grasping tasks. The first key design is that we adopt the local window attention to capture local contextual information and detailed features of graspable objects. Then, we apply the cross window attention to model the long-term dependencies between distant pixels. Object knowledge, environmental configuration, and relationships between different visual entities are aggregated for subsequent grasp detection. The second key design is that we build a hierarchical encoder-decoder architecture with skip-connections, delivering shallow features from the encoder to decoder to enable a multi-scale feature fusion. Due to the powerful attention mechanism, TF-Grasp can simultaneously obtain the local information (i.e., the contours of objects), and model long-term connections such as the relationships between distinct visual concepts in clutter. Extensive computational experiments demonstrate that TF-Grasp achieves competitive results versus state-of-art grasping convolutional models and attains a higher accuracy of 97.99 and 94.6% on Cornell and Jacquard grasping datasets, respectively. Real-world experiments using a 7DoF Franka Emika Panda robot also demonstrate its capability of grasping unseen objects in a variety of scenarios.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据