4.8 Article

Prefrontal solution to the bias-variance tradeoff during reinforcement learning

Journal

CELL REPORTS
Volume 37, Issue 13, Pages -

Publisher

CELL PRESS
DOI: 10.1016/j.celrep.2021.110185

Keywords

-

Categories

Funding

  1. Institute for Information & Commu-nications Technology Planning & Evaluation (IITP) - Korean government (MSIT) [2019-0-01371]
  2. National Research Foundation of Korea - Korean government (MSIT) [NRF-2019M3E5D2A01066267]
  3. NRF - Korean government (MSIT) [2021 M3E5D2A0102249311]
  4. IITP - Korean government [2017-0-00451]
  5. Samsung Research Funding Center of Samsung Elec-tronics [SRFC-TC1603-52]
  6. National Research Foundation of Korea [2019M3E5D2A01066267] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

The brain has been found to adaptively resolve the tradeoff between bias and variance during reinforcement learning, requiring baseline correction for prediction error to offset the adverse effects of irreducible error on value learning. Behavioral evidence of adaptive control has been shown in a Markov decision task with context changes, suggesting that the prediction error baseline signals context changes to improve adaptability. Multiplexed representations of prediction error baseline within specific brain regions have been identified, indicating their role in guiding model based and model-free reinforcement learning.
Evidence that the brain combines different value learning strategies to minimize prediction error is accumulating. However, the tradeoff between bias and variance error, which imposes different constraints on each learning strategy's performance, poses a challenge for value learning. While this tradeoff specifies the requirements for optimal learning, little has been known about how the brain deals with this issue. Here, we hypothesize that the brain adaptively resolves the bias-variance tradeoff during reinforcement learning. Our theory suggests that the solution necessitates baseline correction for prediction error, which offsets the adverse effects of irreducible error on value learning. We show behavioral evidence of adaptive control using a Markov decision task with context changes. The prediction error baseline seemingly signals context changes to improve adaptability. Critically, we identify multiplexed representations of prediction error baseline within the ventrolateral and ventromedial prefrontal cortex, key brain regions known to guide model based and model-free reinforcement learning.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available