Journal
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
Volume 310, Issue 3, Pages 1218-1233Publisher
ELSEVIER
DOI: 10.1016/j.ejor.2023.03.039
Keywords
Transportation; Electric vehicle; Charging and dispatching decision; Reinforcement learning; Markov decision process
Ask authors/readers for more resources
This study proposes a Markov decision process to optimize the charging and order-dispatching schemes for an e-hailing EV fleet. An online approximation algorithm is developed using the model-based reinforcement learning framework and a novel SARSA(A)-sample average approximation architecture. The proposed approach increases the daily revenue by an average of 31.76% and 14.22%, respectively, compared with existing methods.
Given the uncertainty of orders and the dynamically changing workload of charging stations, how to dispatch and charge electric vehicle (EV) fleets becomes a significant challenge facing e-hailing plat-forms. The common practice is to dispatch EVs to serve orders by heuristic matching methods but en-able EV drivers to independently make charging decisions based on their experiences, which may com-promise the platform's performance. This study proposes a Markov decision process to jointly optimize the charging and order-dispatching schemes for an e-hailing EV fleet, which provides pick-up services for passengers only from a designated transportation hub (i.e., no pick-up from different locations). The objective is to maximize the total revenue of the fleet throughout a finite horizon. The complete state transition equations of the EV fleet are formulated to track the state-of-charge of their batteries. To learn the charging and order-dispatching policy in a dynamic stochastic environment, an online approximation algorithm is developed, which integrates the model-based reinforcement learning (RL) framework with a novel SARSA(A)-sample average approximation (SAA) architecture. Compared with the model-free RL al-gorithm and approximation dynamic programming (ADP), our algorithm explores high-quality decisions by an SAA model with empirical state transitions and exploits the best decisions so far by an SARSA(A) sample-trajectory updating. Computational results based on a real case show that, compared with the ex-isting heuristic method and the ADP in the literature, the proposed approach increases the daily revenue by an average of 31.76% and 14.22%, respectively.& COPY; 2023 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available