4.6 Article

A Study of First-Passage Time Minimization via Q-Learning in Heated Gridworlds

Journal

IEEE ACCESS
Volume 9, Issue -, Pages -

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3129709

Keywords

First-passage times; path planning; reinforcement learning; stochastic systems

Funding

  1. Russian Science Foundation [21-11-00363]
  2. Russian Science Foundation [21-11-00363] Funding Source: Russian Science Foundation

Ask authors/readers for more resources

The study reveals bias effects in agents trained using different methods, leading to convergence to suboptimal solutions in practical learning scenarios. High learning rates may inhibit exploration of certain regions, while low rates can increase agent presence in those regions, potentially impacting the application of reinforcement learning methods in real-world scenarios.
Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. Namely, the state-dependency of noise triggers convergence to suboptimal solutions and the respective policies follow them for practically long learning times. The high learning rate prevents exploration of regions with higher temperature, while the low enough rate increases the presence of agents in such regions. These biases of temporal-difference-based reinforcement learning methods may have implications for their application in real-world physical scenarios and agent design.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available