☆ 4.6 Article

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment

ENTROPY (2022)

期刊

ENTROPY

卷 24, 期 3, 页码 -

出版社

MDPI

DOI: 10.3390/e24030401

关键词

autonomy; reinforcement learning; information theory; partial information decomposition; non-trivial informational closure; deep learning

类别

Physics, Multidisciplinary

资金

Estonian Centre of Excellence in IT (EXCITE) [TK148]
European Union [952060]
European Social Fund via IT Academy Programme [SLTAT18311]
Estonian Research Council [PRG1604]
Niedersachsisches Vorab (of the VolkswagenStiftung under the program `Big Data in den Lebenswissenschaften') [ZN3326, ZN3371]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, we introduce an algorithm for computing the level of autonomy of an agent using an information-theoretic formulation. We use the partial information decomposition framework to monitor the autonomy level and environment internalization of reinforcement learning agents. Our experiments show strong correlations between specific PID terms and the obtained reward, as well as the agent's behavior in response to perturbations in the observations.

Intuitively, the level of autonomy of an agent is related to the degree to which the agent's goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent's behaviour against perturbations in the observations.

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文