4.4 Article

Humans Use Directed and Random Exploration to Solve the Explore-Exploit Dilemma

期刊

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL
卷 143, 期 6, 页码 2074-2081

出版社

AMER PSYCHOLOGICAL ASSOC
DOI: 10.1037/a0038199

关键词

explore-exploit; decision making; information bonus; decision noise; reinforcement learning

资金

  1. NIMH NIH HHS [T32 MH065214] Funding Source: Medline

向作者/读者索取更多资源

All adaptive organisms face the fundamental tradeoff between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration). Theory suggests at least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. In this work we investigated the extent to which humans use these two strategies. In our Horizon task, participants made explore-exploit decisions in two contexts that differed in the number of choices that they would make in the future (the time horizon). Participants were allowed to make either a single choice in each game (horizon 1), or 6 sequential choices (horizon 6), giving them more opportunity to explore. By modeling the behavior in these two conditions, we were able to measure exploration-related changes in decision making and quantify the contributions of the two strategies to behavior. We found that participants were more information seeking and had higher decision noise with the longer horizon, suggesting that humans use both strategies to solve the exploration-exploitation dilemma. We thus conclude that both information seeking and choice variability can be controlled and put to use in the service of exploration.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据