4.5 Article

Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System

期刊

NEURAL COMPUTATION
卷 20, 期 12, 页码 3034-3054

出版社

M I T PRESS
DOI: 10.1162/neco.2008.11-07-654

关键词

-

资金

  1. Informatics Circle of Research Excellence of Alberta, Canada
  2. Natural Science and Engineering Research Council of Canada

向作者/读者索取更多资源

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据