☆ 4.7 Article

Generating attentive goals for prioritized hindsight reinforcement learning

KNOWLEDGE-BASED SYSTEMS (2020)

Journal

KNOWLEDGE-BASED SYSTEMS

Volume 203, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.knosys.2020.106140

Keywords

Attentive goals generation; Prioritized hindsight model; Hindsight experience replay; Reinforcement learning

Funding

National Natural Science Foundation of China [61671175]
Sichuan Science and Technology Program, China [2019YFS0069]
Lab of Space Optoelectronic Measurement & Perception, China [LabSOMP-2018-01]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Typical reinforcement learning (RL) performs a single task and does not scale to problems in which an agent must perform multiple tasks, such as moving a robot arm to different locations. The multigoal framework extends typical RL using a goal-conditional value function and policy, whereby the agent pursues different goals in different episodes. By treating a virtual goal as the desired one, and frequently giving the agent rewards, hindsight experience replay has achieved promising results in the sparse-reward setting of multi-goal RL. However, these virtual goals are uniformly sampled after the replay state from experiences, regardless of their significance. We propose a novel prioritized hindsight model for multi-goal RL in which the agent is provided with more valuable goals, as measured by the expected temporal-difference (TD) error. An attentive goals generation (AGG) network, which consists of temporal convolutions, multi-head dot product attentions, and a last-attention network, is structured to generate the virtual goals to replay. The AGG network is trained by following the gradient of TD-error calculated by an actor-critic model, and generates goals to maximize the expected TD-error with replay transitions. The whole network is fully differentiable and can be learned in an end-to-end manner. The proposed method is evaluated on several robotic manipulating tasks and demonstrates improved sample efficiency and performance. (C) 2020 Elsevier B.V. All rights reserved.

Generating attentive goals for prioritized hindsight reinforcement learning

Journal

KNOWLEDGE-BASED SYSTEMS

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Generating attentive goals for prioritized hindsight reinforcement learning

Journal

KNOWLEDGE-BASED SYSTEMS

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper