期刊
NEURAL NETWORKS
卷 20, 期 6, 页码 668-675出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2007.04.028
关键词
dopamine; reinforcement learning; multiple model; timing prediction; classical conditioning
A number of computational in ode Is have explained the behavior of dopamine, neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A responsibility signal, computed from the accuracy of the predictions of the reward predictors. is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the Stimulus-reward interval varies uniformly over a fixed range. (c) 2007 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据