Journal
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Volume 48, Issue 4, Pages 568-574Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2003.809799
Keywords
asynchrony; delays; Markov decision processes (MDP's); neuro-dynamic programming
Ask authors/readers for more resources
Markov decision processes (MDPs) may involve three types of delays. First, state information, rather than being available instantaneously, may arrive with a delay (observation delay). Second, an action may take effect at a later decision stage rather than immediately (action delay). Third, the cost induced by an action may be collected after a number of stages (cost delay). We derive two results, one for constant and one for random delays, for reducing an MDP with delays to an MDP without delays, which, differs only in the size of the state space. The results are based on the intuition that costs may be collected asynchronously, i.e., at a stage other than the one in which they are induced, as long as they are discounted properly.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available