4.3 Article

The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma

Publisher

J A S S S
DOI: 10.18564/jasss.4771

Keywords

Conventions; Repeated Games; Volunteer's Dilemma; Agent-Based Simulation; Reinforcement Learning; Cognitive Modeling

Ask authors/readers for more resources

This study investigates the role of cognitive mechanisms in the emergence of conventions using reinforcement learning models in the repeated volunteer's dilemma. The results show that reinforcement learning models can explain how individuals tacitly agree on a course of action and that contextual cues and equal cost distribution facilitate coordination when optimal solutions are less salient.
We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer's dilemma (VOD). The VOD is a multi-person, binary choice collective goods game in which the contribution of only one individual is necessary and sufficient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD, where all group members have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD, where one strong group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three different classes of reinforcement learning models in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models can provide a parsimonious account of how humans tacitly agree on one course of action when encountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordination when optima are less salient. Furthermore, our models produce better fits with the empirical data when agents act myopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available