4.7 Article

An online reinforcement learning approach to charging and order-dispatching optimization for an e-hailing electric vehicle fleet

Journal

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
Volume 310, Issue 3, Pages 1218-1233

Publisher

ELSEVIER
DOI: 10.1016/j.ejor.2023.03.039

Keywords

Transportation; Electric vehicle; Charging and dispatching decision; Reinforcement learning; Markov decision process

Ask authors/readers for more resources

This study proposes a Markov decision process to optimize the charging and order-dispatching schemes for an e-hailing EV fleet. An online approximation algorithm is developed using the model-based reinforcement learning framework and a novel SARSA(A)-sample average approximation architecture. The proposed approach increases the daily revenue by an average of 31.76% and 14.22%, respectively, compared with existing methods.
Given the uncertainty of orders and the dynamically changing workload of charging stations, how to dispatch and charge electric vehicle (EV) fleets becomes a significant challenge facing e-hailing plat-forms. The common practice is to dispatch EVs to serve orders by heuristic matching methods but en-able EV drivers to independently make charging decisions based on their experiences, which may com-promise the platform's performance. This study proposes a Markov decision process to jointly optimize the charging and order-dispatching schemes for an e-hailing EV fleet, which provides pick-up services for passengers only from a designated transportation hub (i.e., no pick-up from different locations). The objective is to maximize the total revenue of the fleet throughout a finite horizon. The complete state transition equations of the EV fleet are formulated to track the state-of-charge of their batteries. To learn the charging and order-dispatching policy in a dynamic stochastic environment, an online approximation algorithm is developed, which integrates the model-based reinforcement learning (RL) framework with a novel SARSA(A)-sample average approximation (SAA) architecture. Compared with the model-free RL al-gorithm and approximation dynamic programming (ADP), our algorithm explores high-quality decisions by an SAA model with empirical state transitions and exploits the best decisions so far by an SARSA(A) sample-trajectory updating. Computational results based on a real case show that, compared with the ex-isting heuristic method and the ADP in the literature, the proposed approach increases the daily revenue by an average of 31.76% and 14.22%, respectively.& COPY; 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available