4.7 Article

Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function

Journal

ADVANCED ENGINEERING INFORMATICS
Volume 49, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.aei.2021.101360

Keywords

Industrial human-robot collaboration; Collision avoidance; Deep reinforcement learning; Intrinsic reward function

Funding

  1. National Natural Science Foundation of China [51775399, 51675389]
  2. Fundamental Research Funds for the Central Universities [WUT: 2020III047]
  3. International Science and Technology Innovation Cooperation Project of Sichuan Province [20GJHZ0039]

Ask authors/readers for more resources

This paper introduces a deep reinforcement learning approach for real-time collision-free motion planning of an industrial robot, aiming to ensure operator safety in human-robot collaboration in manufacturing. By optimizing the reward function and combining the DDPG algorithm, the proposed IRDDPG algorithm allows the robot to learn an expected collision avoidance policy effectively in a simulation environment.
Aiming at human-robot collaboration in manufacturing, the operator's safety is the primary issue during the manufacturing operations. This paper presents a deep reinforcement learning approach to realize the real-time collision-free motion planning of an industrial robot for human-robot collaboration. Firstly, the safe human robot collaboration manufacturing problem is formulated into a Markov decision process, and the mathematical expression of the reward function design problem is given. The goal is that the robot can autonomously learn a policy to reduce the accumulated risk and assure the task completion time during human-robot collaboration. To transform our optimization object into a reward function to guide the robot to learn the expected behaviour, a reward function optimizing approach based on the deterministic policy gradient is proposed to learn a parameterized intrinsic reward function. The reward function for the agent to learn the policy is the sum of the intrinsic reward function and the extrinsic reward function. Then, a deep reinforcement learning algorithm intrinsic reward-deep deterministic policy gradient (IRDDPG), which is the combination of the DDPG algorithm and the reward function optimizing approach, is proposed to learn the expected collision avoidance policy. Finally, the proposed algorithm is tested in a simulation environment, and the results show that the industrial robot can learn the expected policy to achieve the safety assurance for industrial human-robot collaboration without missing the original target. Moreover, the reward function optimizing approach can help make up for the designed reward function and improve policy performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available