☆ 4.7 Article

Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction

JOURNAL OF NEUROSCIENCE (2011)

Journal

JOURNAL OF NEUROSCIENCE

Volume 31, Issue 14, Pages 5504-5511

Publisher

SOC NEUROSCIENCE

DOI: 10.1523/JNEUROSCI.6316-10.2011

Keywords

Funding

National Institute of Mental Health as part of the National Science Foundation/National Institutes of Health [R01MH087882]
McKnight Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction

Journal

JOURNAL OF NEUROSCIENCE

Publisher

SOC NEUROSCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction

Journal

JOURNAL OF NEUROSCIENCE

Publisher

SOC NEUROSCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper