期刊
IEEE ROBOTICS AND AUTOMATION LETTERS
卷 7, 期 4, 页码 8807-8814出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2022.3188100
关键词
Energy and environment-aware automation; legged robots; learning from demonstration
类别
资金
- Toyota Research Institute
- MIT Biomimetic Robotics Lab
- NAVER LABS
- NSF [2118818]
- Division Of Computer and Network Systems
- Direct For Computer & Info Scie & Enginr [2118818] Funding Source: National Science Foundation
This work presents a deep inverse reinforcement learning method for legged robots terrain traversability modeling using both exteroceptive and proprioceptive sensory data. By incorporating robot-specific inertial features, the model fidelity is improved and a reward dependent on the robot's state during deployment is provided. The proposed method utilizes the Maximum Entropy Deep Inverse Reinforcement Learning algorithm and trajectory ranking loss to optimize legged robot demonstrations. The evaluation is conducted using a dataset from an MIT Mini-Cheetah robot and a Mini-Cheetah simulator.
This work reports ondeveloping a deep inverse reinforcement learning method for legged robots terrain traversability modeling that incorporates both exteroceptive and proprioceptive sensory data. Existing works use robot-agnostic exteroceptive environmental features or handcrafted kinematic features; instead, we propose to also learn robot-specific inertial features from proprioceptive sensory data for reward approximation in a single deep neural network. Incorporating the inertial features can improve the model fidelity and provide a reward that depends on the robot's state during deployment. We train the reward network using the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) algorithm and propose simultaneously minimizing a trajectory ranking loss to deal with the suboptimality of legged robot demonstrations. The demonstrated trajectories are ranked by locomotion energy consumption, in order to learn an energy-aware reward function and a more energy-efficient policy than demonstration. We evaluate our method using a dataset collected by an MIT Mini-Cheetah robot and a Mini-Cheetah simulator. The code is publicly available.(1)
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据