4.7 Article

Antipodal-points-aware dual-decoding network for robotic visual grasp detection oriented to multi-object clutter scenes

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 230, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.120545

关键词

Robotic grasping detection; Grasping representation; RGB-D fusion; Multi-object scene

向作者/读者索取更多资源

This paper proposes an antipodal-points grasping representation model and presents a new network model (APDNet) for grasp detection in multi-object scenes. The proposed method achieves state-of-the-art performance with well-balanced accuracy and efficiency, as demonstrated on a public dataset and real robot platform.
It is challenging for robots to detect grasps with high accuracy and efficiency-oriented to multi-object clutter scenes, especially scenes with objects of large-scale differences. Effective grasping representation, full utilization of data, and formulation of grasping strategies are critical to solving the problem. To this end, this paper proposes an antipodal-points grasping representation model. Based on this, the Antipodal-Points-aware Dual-decoding Network (APDNet) is presented for grasping detection in multi-object scenes. APDNet employs an encoding-decoding architecture. The shared encoding strategy based on an Adaptive Gated Fusion Module (AGFM) is proposed in the encoder to fuse RGB-D multimodal data. Two decoding branches, namely StartpointNet and EndpointNet, are presented to detect antipodal points. To better focus on objects at different scales in multiobject scenes, a global multi-view cumulative attention mechanism, called Global Accumulative Attention Mechanism (GAAM), is also designed in this paper for StartpointNet. The proposed method is comprehensively validated and compared using a public dataset and real robot platform. On the GraspNet-1Billion dataset, the proposed method achieves 30.7%, 26.4%, and 12.7% accuracy at a speed of 88.4 FPS for seen, unseen, and novel objects, respectively. On the AUBO robot platform, the detection and grasp success rates are 100.0% and 95.0% on single-object scenes and 97.0% and 90.3% on multi-object scenes, respectively. It is demonstrated that the proposed method exhibits state-of-the-art performance with well-balanced accuracy and efficiency.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据