☆ 4.1 Article

The statistical structures of reinforcement learning with asymmetric value updates

JOURNAL OF MATHEMATICAL PSYCHOLOGY (2018)

期刊

JOURNAL OF MATHEMATICAL PSYCHOLOGY

卷 87, 期 -, 页码 31-45

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jmp.2018.09.002

关键词

Reinforcement learning; Asymmetric value update; Learning rate; Choice perseverance; Model fitting; Logistic regression

类别

Mathematics, Interdisciplinary Applications Social Sciences, Mathematical Methods Psychology, Mathematical

资金

JSPS KAKENHI, Japan [JP17H05946, 18K03173]
Grants-in-Aid for Scientific Research [18K03173] Funding Source: KAKEN

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Reinforcement learning (RL) models have been broadly used in modeling the choice behavior of humans and other animals. In standard RL models, the action values are assumed to be updated according to the reward prediction error (RPE), i.e., the difference between the obtained reward and the expected reward. Numerous studies have noted that the magnitude of the update is biased depending on the sign of the RPE. The bias is represented in RL models by differential learning rates for positive and negative RPEs. However, which aspect of behavioral data that the estimated differential learning rates reflect is not well understood. In this study, we investigate how the differential learning rates influence the statistical properties of choice behavior (i.e., the relation between past experiences and the current choice) based on theoretical considerations and numerical simulations. We clarify that when the learning rates differ, the impact of a past outcome depends on the subsequent outcomes, in contrast to standard RL models with symmetric value updates. Based on the results, we propose a model-neutral statistical test to validate the hypothesis that value updates are asymmetric. The asymmetry in the value updates induces the autocorrelation of choice (i.e., the tendency to repeat the same choice or to switch the choice irrespective of past rewards). Conversely, if an RL model without an intrinsic autocorrelation factor is fated to data that possess an intrinsic autocorrelation, a statistical bias to overestimate the difference in learning rates arises. We demonstrate that this bias can cause a statistical artifact in RL-model fitting leading to a pseudo-positivity bias and a pseudo-confirmation bias. (C) 2018 The Author. Published by Elsevier Inc.

The statistical structures of reinforcement learning with asymmetric value updates

期刊

JOURNAL OF MATHEMATICAL PSYCHOLOGY

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The statistical structures of reinforcement learning with asymmetric value updates

期刊

JOURNAL OF MATHEMATICAL PSYCHOLOGY

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文