4.7 Article

Within the scope of prediction: Shaping intrinsic rewards via evaluating uncertainty

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 206, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.117775

Keywords

Reinforcement learning; Policy optimization; Exploration; Prediction; Reward shaping

Funding

  1. National Natural Science Foundation of China [61303108]
  2. Natural Science Foundation of Jiangsu Province, China [BK20211102]
  3. Priority Academic Pro-gram Development of Jiangsu Higher Education Institutions, China

Ask authors/readers for more resources

The method of prediction based on uncertainty exploration (SPE) improves the quality of exploration and reduces noise interference in deep reinforcement learning, leading to significant improvements in simulated environments.
The agent of reinforcement learning based approaches needs to explore to learn more about the environment to seek optimal policy. However, simply increasing the frequency of stochastic exploration sometimes fails to work or even causes the agent to fall into traps. To solve the problem, it is essential to improve the quality of exploration. An approach, referred to as the scope of prediction based on uncertainty exploration (SPE), is proposed, taking advantage of the uncertainty mechanism and considering the stochasticity of prospecting. As by uncertainty mechanism, the unexpected states make more curiosity, the model derives higher uncertainty by projecting future scenarios to compare with the actual future to explore the world. The SPE method utilizes a prediction network to predict subsequent observations and calculates the mean squared difference value of the real observations and the following observations to measure uncertainty, encouraging the agent to explore unknown regions more effectively. Moreover, to reduce the noise interference caused by uncertainty, a reward penalty model is developed to discriminate the noise by current observations and action prediction for future rewards to improve the interference ability against noise so that the agent can escape from the noisy region. Experiment results showed that deep reinforcement learning approaches equipped with SPE demonstrated significant improvements in simulated environments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available