期刊
IEEE ROBOTICS AND AUTOMATION LETTERS
卷 5, 期 2, 页码 3612-3619出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2020.2976272
关键词
Task analysis; Heuristic algorithms; Cost function; Planning; Uncertainty; Robots; Trajectory; Reinforcement learning; imitation learning; optimal control
类别
资金
- Scalable Collaborative Human-Robot Learning (SCHooL) Project, a NSF National Robotics Initiative [1734633]
- NSF GRFP
- Office of Naval Research [N00014-311]
- Direct For Computer & Info Scie & Enginr [1734633] Funding Source: National Science Foundation
- Div Of Information & Intelligent Systems [1734633] Funding Source: National Science Foundation
Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes exploration and constraint satisfaction challenging. We address these issues with a new model-based reinforcement learning algorithm, Safety Augmented Value Estimation from Demonstrations (SAVED), which uses supervision that only identifies task completion and a modest set of suboptimal demonstrations to constrain exploration and learn efficiently while handling complex constraints. We then compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms on 6 standard simulation benchmarks involving navigation and manipulation and a physical knot-tying task on the daVinci surgical robot. Results suggest that SAVEDoutperforms priormethods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn a control policy directly on a real robot in less than an hour. For tasks on the robot, baselines succeed less than 5% of the time while SAVED has a success rate of over 75% in the first 50 training iterations. Code and supplementary material is available at https://tinyurl.com/ saved-rl.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据