4.7 Article

Uncertainty maximization in partially observable domains: A cognitive perspective

Journal

NEURAL NETWORKS
Volume 162, Issue -, Pages 456-471

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2023.02.044

Keywords

Partially observable Markov decision process; Cognitive modeling; Entropy; Reinforcement learning; Attention mechanisms and development; Neural networks for development

Ask authors/readers for more resources

Faced with increasingly complex application domains, artificial learning agents can process a large amount of data but also encounter the challenge of encoding and processing redundant information. This study explores the potential of learning systems in partially observable domains to selectively focus on the specific type of information related to causal interactions among transitioning states. By implementing an adaptive masking of observations based on a temporal difference displacement criterion, significant improvements in the convergence of temporal difference algorithms applied to partially observable Markov processes are achieved. Experimental results demonstrate the effectiveness of the proposed framework across a range of machine learning problems, from complex visuals like Atari games to simple control problems like CartPole.
Faced with an ever-increasing complexity of their domains of application, artificial learning agents are now able to scale up in their ability to process an overwhelming amount of data. However, this comes at the cost of encoding and processing an increasing amount of redundant information. This work exploits the possibility of learning systems, applied in partially observable domains, to selectively focus on the specific type of information that is more likely related to the causal interaction among transitioning states. A temporal difference displacement criterion is defined to implement adaptive masking of the observations. It can enable a significant improvement of convergence of temporal difference algorithms applied to partially observable Markov processes, as shown by experiments performed under a variety of machine learning problems, ranging from highly complex visuals as Atari games to simple textbook control problems such as CartPole. The proposed framework can be added to most RL algorithms since it only affects the observation process, selecting the parts more promising to explain the dynamics of the environment and reducing the dimension of the observation space.(c) 2023 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available