4.1 Article

The statistical structures of reinforcement learning with asymmetric value updates

期刊

JOURNAL OF MATHEMATICAL PSYCHOLOGY
卷 87, 期 -, 页码 31-45

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jmp.2018.09.002

关键词

Reinforcement learning; Asymmetric value update; Learning rate; Choice perseverance; Model fitting; Logistic regression

资金

  1. JSPS KAKENHI, Japan [JP17H05946, 18K03173]
  2. Grants-in-Aid for Scientific Research [18K03173] Funding Source: KAKEN

向作者/读者索取更多资源

Reinforcement learning (RL) models have been broadly used in modeling the choice behavior of humans and other animals. In standard RL models, the action values are assumed to be updated according to the reward prediction error (RPE), i.e., the difference between the obtained reward and the expected reward. Numerous studies have noted that the magnitude of the update is biased depending on the sign of the RPE. The bias is represented in RL models by differential learning rates for positive and negative RPEs. However, which aspect of behavioral data that the estimated differential learning rates reflect is not well understood. In this study, we investigate how the differential learning rates influence the statistical properties of choice behavior (i.e., the relation between past experiences and the current choice) based on theoretical considerations and numerical simulations. We clarify that when the learning rates differ, the impact of a past outcome depends on the subsequent outcomes, in contrast to standard RL models with symmetric value updates. Based on the results, we propose a model-neutral statistical test to validate the hypothesis that value updates are asymmetric. The asymmetry in the value updates induces the autocorrelation of choice (i.e., the tendency to repeat the same choice or to switch the choice irrespective of past rewards). Conversely, if an RL model without an intrinsic autocorrelation factor is fated to data that possess an intrinsic autocorrelation, a statistical bias to overestimate the difference in learning rates arises. We demonstrate that this bias can cause a statistical artifact in RL-model fitting leading to a pseudo-positivity bias and a pseudo-confirmation bias. (C) 2018 The Author. Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据