4.6 Article

Energy-Based Legged Robots Terrain Traversability Modeling via Deep Inverse Reinforcement Learning

期刊

IEEE ROBOTICS AND AUTOMATION LETTERS
卷 7, 期 4, 页码 8807-8814

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2022.3188100

关键词

Energy and environment-aware automation; legged robots; learning from demonstration

类别

资金

  1. Toyota Research Institute
  2. MIT Biomimetic Robotics Lab
  3. NAVER LABS
  4. NSF [2118818]
  5. Division Of Computer and Network Systems
  6. Direct For Computer & Info Scie & Enginr [2118818] Funding Source: National Science Foundation

向作者/读者索取更多资源

This work presents a deep inverse reinforcement learning method for legged robots terrain traversability modeling using both exteroceptive and proprioceptive sensory data. By incorporating robot-specific inertial features, the model fidelity is improved and a reward dependent on the robot's state during deployment is provided. The proposed method utilizes the Maximum Entropy Deep Inverse Reinforcement Learning algorithm and trajectory ranking loss to optimize legged robot demonstrations. The evaluation is conducted using a dataset from an MIT Mini-Cheetah robot and a Mini-Cheetah simulator.
This work reports ondeveloping a deep inverse reinforcement learning method for legged robots terrain traversability modeling that incorporates both exteroceptive and proprioceptive sensory data. Existing works use robot-agnostic exteroceptive environmental features or handcrafted kinematic features; instead, we propose to also learn robot-specific inertial features from proprioceptive sensory data for reward approximation in a single deep neural network. Incorporating the inertial features can improve the model fidelity and provide a reward that depends on the robot's state during deployment. We train the reward network using the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) algorithm and propose simultaneously minimizing a trajectory ranking loss to deal with the suboptimality of legged robot demonstrations. The demonstrated trajectories are ranked by locomotion energy consumption, in order to learn an energy-aware reward function and a more energy-efficient policy than demonstration. We evaluate our method using a dataset collected by an MIT Mini-Cheetah robot and a Mini-Cheetah simulator. The code is publicly available.(1)

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据