Journal
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
Volume 14, Issue 6, Pages -Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3623405
Keywords
Compound agent learning; deep reinforcement learning; policy fusion; dynamic weights; prior reward
Ask authors/readers for more resources
We propose a new method for policy fusion in deep reinforcement learning, which dynamically selects sub-tasks and reduces fusion bias. Experimental results show significant improvements in task duration, episode reward, and score difference.
In Deep Reinforcement Learning (DRL) domain, a compound learning task is often decomposed into several sub-tasks in a divide-and-conquer manner, each trained separately and then fused concurrently to achieve the original task, referred to as policy fusion. However, the state-of-the-art (SOTA) policy fusion methods treat the importance of sub-tasks equally throughout the task process, eliminating the possibility of the agent relying on different sub-tasks at various stages. To address this limitation, we propose a generic policy fusion approach, referred to as Policy Fusion Learning withDynamicWeights and Prior Reward (PFLDWPR), to automate the time-varying selection of sub-tasks. Specifically, PFLDWPR produces a time-varying one-hot vector for sub-tasks to dynamically select a suitable sub-task and mask the rest throughout the entire task process, enabling the fused strategy to optimally guide the agent in executing the compound task. The sub-tasks with the dynamic one-hot vector are then aggregated to obtain the action policy for the original task. Moreover, we collect sub-tasks's rewards at the pre-training stage as a prior reward, which, alongwith the current reward, is used to train the policy fusion network. Thus, this approach reduces fusion bias by leveraging prior experience. Experimental results under three popular learning tasks demonstrate that the proposed method significantly improves three SOTA policy fusion methods in terms of task duration, episode reward, and score difference.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available