☆ 4.6 Article

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

PLOS COMPUTATIONAL BIOLOGY (2013)

期刊

PLOS COMPUTATIONAL BIOLOGY

卷 9, 期 4, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pcbi.1003024

关键词

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

European Community [FP7-243914(Brain-I-Nets)]
Sinergia grant of the Swiss National Science Foundation (SNF) [CRSIK2\_122697]
Swiss National Science Foundation (SNF) [CRSIK2_122697] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文