☆ 4.7 Article

Distributed consensus-based multi-agent temporal-difference learning

AUTOMATICA (2023)

期刊

AUTOMATICA

卷 151, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.automatica.2023.110922

关键词

类别

Automation & Control Systems Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes two new distributed consensus-based algorithms for temporal-difference learning in multi-agent Markov decision processes. The algorithms are off-policy type and aim to linearly approximate the value function. By restricting agents' observations and communications to their local data and small neighborhoods, the algorithms consist of local updates of parameter estimates and a dynamic consensus scheme implemented over a time-varying communication network. The algorithms are completely decentralized, allowing for efficient parallelization and applications in scenarios where agents have different behavior policies and initial state distributions while evaluating a common target policy.

In this paper we propose two new distributed consensus-based algorithms for temporal-difference learning in multi-agent Markov decision processes. The algorithms are of off-policy type and are aimed at linear approximation of the value function. Restricting agents' observations to local data and communications to their small neighborhoods, the algorithms consist of: (a) local updates of the parameter estimates based on either the standard TD(),) or the emphatic ETD(),) algorithm, and (b) dynamic consensus scheme implemented over a time-varying lossy communication network. The algorithms are completely decentralized, allowing efficient parallelization and applications where the agents may have different behavior policies and different initial state distributions while evaluating a common target policy. It is proved under nonrestrictive assumptions that the proposed algorithms weakly converge to the solutions of the mean ordinary differential equation (ODE) common for all the agents. It is also proved that the whole system may be stabilized by a proper choice of the network and that the parameter estimates weakly converge to consensus. Discussion is given on the asymptotic bias and variance of the estimates, on the projected forms of the proposed algorithms, as well as on restrictiveness of the adopted assumptions. Simulation results illustrate the main properties of the algorithms and provide comparisons with similar schemes.& COPY; 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Distributed consensus-based multi-agent temporal-difference learning

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Distributed consensus-based multi-agent temporal-difference learning

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文