☆ 4.6 Article

A Method Integrating Q-Learning With Approximate Dynamic Programming for Gantry Work Cell Scheduling

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2021)

Journal

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Volume 18, Issue 1, Pages 85-93

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASE.2020.2984739

Keywords

Approximate dynamic programming (ADP); gantry scheduling; Markov decision process (MDP); planning and learning; Q-learning

Funding

U.S. National Science Foundation (NSF) [CMMI 1351160, CMMI 1853454]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article proposes an innovative method, Q-ADP, that integrates reinforcement learning and approximate dynamic programming for real-time gantry scheduling in a gantry work cell. Numerical studies show that Q-ADP outperforms standard Q-learning and requires less data for convergence. By learning directly from interactions with the environment, the method avoids bias from model designing, making it particularly useful when real data are limited.

This article formulates gantry real-time scheduling in a gantry work cell, where the material transfer is driven by gantries, as a Markov decision process (MDP). Classical learning methods and planning methods for solving the optimization problems in MDP are discussed. An innovative method, called Q-ADP, is proposed to integrate reinforcement learning (RL) with approximate dynamic programming (ADP). Q-ADP uses model-free Q-learning algorithm to learn state values through interactions with the environment, meanwhile, planning steps during the learning process opt for ADP to keep updating state values through several sample paths. A model of one-step transition probabilities is built based on the machines' reliability model, and serves the ADP algorithm. To demonstrate the effectiveness of this method, a numerical study is performed to show the production performance, compared to a standard Q-learning algorithm. The simulation results show that Q-ADP outperforms standard Q-learning under the same length of training process. It is also shown that with the benefit of repeated updating state values through sample paths, Q-ADP requires less data for gantry policy to converge, which makes the method promising when real data are limited. Note to Practitioners-The goal of this work is to find a near optimal gantry assignment policy to realize real-time control of material handling gantry/robot movements in gantry work cells. Properly assigning gantries based on real-time situations of the production system can avoid machines' stoppage due to material shortage, and consequently improve production performance. This gantry scheduling is a sequential decision-making problem and can be presented by Markov Decision Process (MDP). To solve the MDP problem, an algorithm integrating model-free Q-learning and model-based approximate dynamic programming (ADP) is proposed. By learning directly from the interaction with the environment, the method avoids bias problem from any model designing. Meanwhile, a planning process during learning can efficiently speed up the learning for convergence of the policy, and this particularly benefits to the scenario when the real data are insufficient.

A Method Integrating Q-Learning With Approximate Dynamic Programming for Gantry Work Cell Scheduling

Journal

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Method Integrating Q-Learning With Approximate Dynamic Programming for Gantry Work Cell Scheduling

Journal

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper