Abstract
With a surge demand for instant gratification in online-shopping, offering same-day delivery with heterogeneous fleets of drones and vehicles provides new insights for decision makers. However, decisions in real-time involving assignment and routing of vehicles and drones suffer “curse of dimensionality”, due to stochastic and dynamic orders, huge state spaces as well as associated and diverse decisions. In this paper, a deep reinforcement learning (DRL) based approach is presented to handle this dynamic decision problem. First, a routed-based Markov decision process is formulated to model the problem. Besides, a DRL-based algorithm combining proximal policy optimization and heuristics (PPOh) is developed to decide whether to accept customer requests, how to assign orders and plan routes of fleets. Evaluation on extensive computational experiments shows that PPOh outperforms the extant methods and evidently improves service rates of fleets under the same workload.
Supported by National Natural Science Foundation of China (Grants No. U213320067).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azi, N., Gendreau, M., Potvin, J.Y.: A dynamic vehicle routing problem with multiple delivery routes. Ann. Oper. Res. 199, 103–112 (2012)
Chen, J., Li, S.E., Tomizuka, M.: Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 23(6), 5068–5078 (2021)
Chen, X., Ulmer, M.W., Thomas, B.W.: Deep Q-learning for same-day delivery with vehicles and drones. Eur. J. Oper. Res. 298(3), 939–952 (2022)
Ferrucci, F., Bock, S.: Pro-active real-time routing in applications with multiple request patterns. Eur. J. Oper. Res. 253(2), 356–371 (2016)
Ferrucci, F., Bock, S., Gendreau, M.: A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur. J. Oper. Res. 225(1), 130–141 (2013)
Joe, W., Lau, H.C.: Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 394–402 (2020)
Klapp, M.A., Erera, A.L., Toriello, A.: The dynamic dispatch waves problem for same-day delivery. Eur. J. Oper. Res. 271(2), 519–534 (2018)
Klapp, M.A., Erera, A.L., Toriello, A.: The one-dimensional dynamic dispatch waves problem. Transp. Sci. 52(2), 402–415 (2018)
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12 (1999)
Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Ma, Y., et al.: Learning to iteratively solve routing problems with dual-aspect collaborative transformer. In: Advances in Neural Information Processing Systems, vol. 34, pp. 11096–11107 (2021)
Mazyavkina, N., Sviridov, S., Ivanov, S., Burnaev, E.: Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103 (2000)
Ulmer, M.W., Goodson, J.C., Mattfeld, D.C., Thomas, B.W.: On modeling stochastic dynamic vehicle routing problems. EURO J. Transp. Logist. 9(2), 100008 (2020)
Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp. Sci. 52(1), 20–37 (2018)
Ulmer, M.W., Thomas, B.W.: Same-day delivery with heterogeneous fleets of drones and vehicles. Networks 72(4), 475–505 (2018)
Van Heeswijk, W.J., Mes, M.R., Schutten, J.M.: The delivery dispatching problem with time windows for urban consolidation centers. Transp. Sci. 53(1), 203–221 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, M., Cai, K., Zhao, P. (2024). Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2023. Communications in Computer and Information Science, vol 2017. Springer, Singapore. https://doi.org/10.1007/978-981-97-0837-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-97-0837-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0836-9
Online ISBN: 978-981-97-0837-6
eBook Packages: Computer ScienceComputer Science (R0)