☆ 4.6 Article

Manifold-based multi-objective policy search with sample reuse

NEUROCOMPUTING (2017)

期刊

NEUROCOMPUTING

卷 263, 期 -, 页码 3-14

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2016.11.094

关键词

Multi-objective; Reinforcement learning; Policy search; Black-box optimization; Importance sampling

类别

Computer Science, Artificial Intelligence

资金

DFG grant within the priority program Autonomous learning [SPP1527]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. Building on recent advances in reinforcement learning and multi-objective policy search, we present two novel manifold-based algorithms to solve multi-objective Markov decision processes. These algorithms combine episodic exploration strategies and importance sampling to efficiently learn a manifold in the policy parameter space such that its image in the objective space accurately approximates the Pareto frontier. We show that episode-based approaches and importance sampling can lead to significantly better results in the context of multi-objective reinforcement learning. Evaluated on three multi-objective problems, our algorithms outperform state-of-the-art methods both in terms of quality of the learned Pareto frontier and sample efficiency. (C) 2017 Elsevier B.V. All rights reserved.

Manifold-based multi-objective policy search with sample reuse

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Manifold-based multi-objective policy search with sample reuse

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文