☆ 4.7 Article

Distributional generative adversarial imitation learning with reproducing kernel generalization

NEURAL NETWORKS (2023)

Journal

NEURAL NETWORKS

Volume 165, Issue -, Pages 43-59

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2023.05.027

Keywords

Generative adversarial imitation learning; Policy generalization; Computational properties; Distributional reinforcement learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Generative adversarial imitation learning (GAIL) treats imitation learning (IL) as a distribution matching problem between expert policy and learned policy. This paper focuses on the generalization and computational properties of policy classes. It is proven that GAIL can ensure generalization when policies are well controlled. By incorporating distributional reinforcement learning (RL) into GAIL, the greedy distributional soft gradient (GDSG) algorithm is proposed to solve GAIL. GDSG has advantages including alleviating Q-value overestimation problem and improving policy performance through sufficient exploration, as well as attaining a sublinear convergence rate to a stationary solution. Comprehensive experimental verification in MuJoCo environments demonstrates that GDSG outperforms previous GAIL variants in mimicking expert demonstrations.

Generative adversarial imitation learning (GAIL) regards imitation learning (IL) as a distribution matching problem between the state-action distributions of the expert policy and the learned policy. In this paper, we focus on the generalization and computational properties of policy classes. We prove that the generalization can be guaranteed in GAIL when the class of policies is well controlled. With the capability of policy generalization, we introduce distributional reinforcement learning (RL) into GAIL and propose the greedy distributional soft gradient (GDSG) algorithm to solve GAIL. The main advantages of GDSG can be summarized as: (1) Q-value overestimation, a crucial factor leading to the instability of GAIL with off-policy training, can be alleviated by distributional RL. (2) By considering the maximum entropy objective, the policy can be improved in terms of performance and sample efficiency through sufficient exploration. Moreover, GDSG attains a sublinear convergence rate to a stationary solution. Comprehensive experimental verification in MuJoCo environments shows that GDSG can mimic expert demonstrations better than previous GAIL variants. & COPY; 2023 Elsevier Ltd. All rights reserved.

Distributional generative adversarial imitation learning with reproducing kernel generalization

Journal

NEURAL NETWORKS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Distributional generative adversarial imitation learning with reproducing kernel generalization

Journal

NEURAL NETWORKS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper