☆ 4.7 Article

On the Guaranteed Almost Equivalence Between Imitation Learning From Observation and Demonstration

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 34, 期 2, 页码 677-689

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2021.3099621

关键词

Robots; Task analysis; Trajectory; Cloning; Mathematical model; Heuristic algorithms; Control theory; Generative adversarial imitation learning (GAIL); imitation learning (IL); learning from demonstration (LfD); learning from observation (LfO); reinforcement learning (RL)

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Imitation learning from observation (LfO) is more preferable than imitation learning from demonstration (LfD) due to the non-necessity of expert actions. This article proves that LfO is almost equivalent to LfD in the deterministic robot environment and even in the robot environment with bounded randomness. Extensive experiments demonstrate that LfO achieves comparable performance to LfD. This suggests that LfO can be safely applied in practice without sacrificing performance compared to LfD.

Imitation learning from observation (LfO) is more preferable than imitation learning from demonstration (LfD) because of the nonnecessity of expert actions when reconstructing the expert policy from the expert data. However, previous studies imply that the performance of LfO is inferior to LfD by a tremendous gap, which makes it challenging to employ LfO in practice. By contrast, this article proves that LfO is almost equivalent to LfD in the deterministic robot environment, and more generally even in the robot environment with bounded randomness. In the deterministic robot environment, from the perspective of the control theory, we show that the inverse dynamics disagreement between LfO and LfD approaches zero, meaning that LfO is almost equivalent to LfD. To further relax the deterministic constraint and better adapt to the practical environment, we consider bounded randomness in the robot environment and prove that the optimizing targets for both LfD and LfO remain almost the same in the more generalized setting. Extensive experiments for multiple robot tasks are conducted to demonstrate that LfO achieves comparable performance to LfD empirically. In fact, the most common robot systems in reality are the robot environment with bounded randomness (i.e., the environment this article considered). Hence, our findings greatly extend the potential of LfO and suggest that we can safely apply LfO in practice without sacrificing the performance compared to LfD.

On the Guaranteed Almost Equivalence Between Imitation Learning From Observation and Demonstration

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On the Guaranteed Almost Equivalence Between Imitation Learning From Observation and Demonstration

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文