4.7 Article

Scalable reinforcement learning approaches for dynamic pricing in ride-hailing systems

Journal

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.trb.2023.102848

Keywords

Ride-hailing; Dynamic pricing; Reinforcement learning

Ask authors/readers for more resources

This study proposes a reinforcement learning-based approach for the dynamic pricing problem in ride-hailing systems. By translating the problem into a Markov Decision Process, the existence of a deterministic stationary optimal policy is proven. Using the offline learning algorithm TD3, the optimal pricing policy is learned from historical data and applied to the next time slot. Extensive numerical experiments demonstrate the effectiveness of the proposed algorithm in finding the optimal pricing policy and improving platform profit and service efficiency in both small and large networks.
Dynamic pricing is a widely applied strategy by ride-hailing companies, such as Uber and Lyft, to match the trip demand with the availability of drivers. Deciding proper pricing policies challenging and existing reinforcement learning (RL)-based solutions are restricted in solving small-scale problems. In this study, we contribute to RL-based approaches that can address the dynamic pricing problem in real-world-scale ride-hailing systems. We first characterize the dynamic pricing problem with a clear distinction between historical prices and current prices. We then translate our dynamic pricing problem into Markov Decision Process (MDP) and prove the existence of a deterministic stationary optimal policy. Our solutions are based on an off-policy reinforcement learning algorithm called twin-delayed deep determinant policy gradient (TD3) that performs offline learning of the optimal pricing policy using historical data and applies the learned policy to the next time slot, e.g., one week. We enhance TD3 by creating three mechanisms to reduce our model complexity and enhance training effectiveness. Extensive numerical experiments are conducted on both small grid networks (16 zones) and the NYC network (242 zones) to demonstrate the performance of the proposed algorithm. The results show our algorithm can efficiently find the optimal pricing policy for both the small and large networks, and can significantly enhance the platform profit and service efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available