4.7 Article

Emergence of cooperation in two-agent repeated games with reinforcement learning

Journal

CHAOS SOLITONS & FRACTALS
Volume 175, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.chaos.2023.114032

Keywords

Nonlinear dynamics; Cooperation; Repeated game; Reinforcement learning

Ask authors/readers for more resources

Cooperation is essential in ecosystems and human society, and reinforcement learning plays a crucial role in understanding its emergence. This study focuses on the individual level dynamics of cooperation in a two-agent system. It is found that strong memory and long-sighted expectation lead to the emergence of Coordinated Optimal Policies (COPs) which maintain high cooperation levels. However, when memory weakens and expectation decreases, cooperation becomes unstable, and the policy of defection prevails. The study also suggests that tolerance can be a precursor to a crisis in cooperation. The findings provide insights into the stability of cooperation and have implications for more complex scenarios.
Cooperation is the foundation of ecosystems and the human society, and the reinforcement learning provides crucial insight into the mechanism for its emergence. However, most previous work has mostly focused on the self-organization at the population level, the fundamental dynamics at the individual level remains unclear. Here, we investigate the evolution of cooperation in a two-agent system, where each agent pursues optimal policies according to the classical Q-learning algorithm in playing the strict prisoner's dilemma. We reveal that a strong memory and long-sighted expectation yield the emergence of Coordinated Optimal Policies (COPs), where both agents act like Win-Stay, Lose-Shift(WSLS) to maintain a high level of cooperation. Otherwise, players become tolerant toward their co-player's defection and the cooperation loses stability in the end where the policy all Defection(All-D) prevails. This suggests that tolerance could be a good precursor to a crisis in cooperation. Furthermore, our analysis shows that the Coordinated Optimal Modes (COMs) for different COPs gradually lose stability as memory weakens and expectation for the future decreases, where agents fail to predict co-player's action in games and defection dominates. As a result, we give the constraint to expectations of future and memory strength for maintaining cooperation. In contrast to the previous work, the impact of exploration on cooperation is found not be consistent, but depends on composition of COMs. By clarifying these fundamental issues in this two-player system, we hope that our work could be helpful for understanding the emergence and stability of cooperation in more complex scenarios in reality.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available