4.7 Article

QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus plus Innovations

Journal

IEEE TRANSACTIONS ON SIGNAL PROCESSING
Volume 61, Issue 7, Pages 1848-1862

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2013.2241057

Keywords

Collaborative network processing; consensus plus innovations; distributed Q-learning; mixed time-scale dynamics; multi-agent stochastic control; reinforcement learning

Funding

  1. National Science Foundation [CCF-1011903, DMS-1118605]
  2. Air Force Office of Scientific Research [FA-95501010291]
  3. Division of Computing and Communication Foundations
  4. Direct For Computer & Info Scie & Enginr [1018509] Funding Source: National Science Foundation
  5. Division Of Mathematical Sciences
  6. Direct For Mathematical & Physical Scien [1118605] Funding Source: National Science Foundation

Ask authors/readers for more resources

The paper develops QD-learning, a distributed version of reinforcement Q-learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the interagent communication network is weakly connected, we prove that QD-learning, a consensus+innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available