Journal
IEEE TRANSACTIONS ON SIGNAL PROCESSING
Volume 61, Issue 7, Pages 1848-1862Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2013.2241057
Keywords
Collaborative network processing; consensus plus innovations; distributed Q-learning; mixed time-scale dynamics; multi-agent stochastic control; reinforcement learning
Categories
Funding
- National Science Foundation [CCF-1011903, DMS-1118605]
- Air Force Office of Scientific Research [FA-95501010291]
- Division of Computing and Communication Foundations
- Direct For Computer & Info Scie & Enginr [1018509] Funding Source: National Science Foundation
- Division Of Mathematical Sciences
- Direct For Mathematical & Physical Scien [1118605] Funding Source: National Science Foundation
Ask authors/readers for more resources
The paper develops QD-learning, a distributed version of reinforcement Q-learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the interagent communication network is weakly connected, we prove that QD-learning, a consensus+innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available