☆ 4.6 Article

A Prioritized objective actor-critic method for deep reinforcement learning

NEURAL COMPUTING & APPLICATIONS (2021)

期刊

NEURAL COMPUTING & APPLICATIONS

卷 33, 期 16, 页码 10335-10349

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s00521-021-05795-0

关键词

Deep learning; Reinforcement learning; Learning systems; Multi-objective optimization; Actor-critic architecture

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces two actor-critic deep reinforcement learning methods, Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), to enhance the performance of agents in solving complex problems. The research adjusts agent behaviors by adopting a weighted-sum scalarization of different objective functions.

An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents' poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest.

A Prioritized objective actor-critic method for deep reinforcement learning

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Prioritized objective actor-critic method for deep reinforcement learning

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文