Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles

Li, Meng; Cai, Kaiquan; Zhao, Peng

doi:10.1007/978-981-97-0837-6_15

Meng Li⁷,
Kaiquan Cai⁷ &
Peng Zhao⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2017))

Included in the following conference series:

International Conference on Data Mining and Big Data

97 Accesses

Abstract

With a surge demand for instant gratification in online-shopping, offering same-day delivery with heterogeneous fleets of drones and vehicles provides new insights for decision makers. However, decisions in real-time involving assignment and routing of vehicles and drones suffer “curse of dimensionality”, due to stochastic and dynamic orders, huge state spaces as well as associated and diverse decisions. In this paper, a deep reinforcement learning (DRL) based approach is presented to handle this dynamic decision problem. First, a routed-based Markov decision process is formulated to model the problem. Besides, a DRL-based algorithm combining proximal policy optimization and heuristics (PPOh) is developed to decide whether to accept customer requests, how to assign orders and plan routes of fleets. Evaluation on extensive computational experiments shows that PPOh outperforms the extant methods and evidently improves service rates of fleets under the same workload.

Supported by National Natural Science Foundation of China (Grants No. U213320067).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azi, N., Gendreau, M., Potvin, J.Y.: A dynamic vehicle routing problem with multiple delivery routes. Ann. Oper. Res. 199, 103–112 (2012)
Article MathSciNet Google Scholar
Chen, J., Li, S.E., Tomizuka, M.: Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 23(6), 5068–5078 (2021)
Article Google Scholar
Chen, X., Ulmer, M.W., Thomas, B.W.: Deep Q-learning for same-day delivery with vehicles and drones. Eur. J. Oper. Res. 298(3), 939–952 (2022)
Article MathSciNet Google Scholar
Ferrucci, F., Bock, S.: Pro-active real-time routing in applications with multiple request patterns. Eur. J. Oper. Res. 253(2), 356–371 (2016)
Article MathSciNet Google Scholar
Ferrucci, F., Bock, S., Gendreau, M.: A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur. J. Oper. Res. 225(1), 130–141 (2013)
Article Google Scholar
Joe, W., Lau, H.C.: Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 394–402 (2020)
Google Scholar
Klapp, M.A., Erera, A.L., Toriello, A.: The dynamic dispatch waves problem for same-day delivery. Eur. J. Oper. Res. 271(2), 519–534 (2018)
Article MathSciNet Google Scholar
Klapp, M.A., Erera, A.L., Toriello, A.: The one-dimensional dynamic dispatch waves problem. Transp. Sci. 52(2), 402–415 (2018)
Article Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12 (1999)
Google Scholar
Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Ma, Y., et al.: Learning to iteratively solve routing problems with dual-aspect collaborative transformer. In: Advances in Neural Information Processing Systems, vol. 34, pp. 11096–11107 (2021)
Google Scholar
Mazyavkina, N., Sviridov, S., Ivanov, S., Burnaev, E.: Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021)
Article MathSciNet Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103 (2000)
Google Scholar
Ulmer, M.W., Goodson, J.C., Mattfeld, D.C., Thomas, B.W.: On modeling stochastic dynamic vehicle routing problems. EURO J. Transp. Logist. 9(2), 100008 (2020)
Article Google Scholar
Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp. Sci. 52(1), 20–37 (2018)
Article Google Scholar
Ulmer, M.W., Thomas, B.W.: Same-day delivery with heterogeneous fleets of drones and vehicles. Networks 72(4), 475–505 (2018)
Article MathSciNet Google Scholar
Van Heeswijk, W.J., Mes, M.R., Schutten, J.M.: The delivery dispatching problem with time windows for urban consolidation centers. Transp. Sci. 53(1), 203–221 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Beihang University, Beijing, 100191, China
Meng Li, Kaiquan Cai & Peng Zhao

Authors

Meng Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaiquan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Zhao .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Techn, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Cai, K., Zhao, P. (2024). Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2023. Communications in Computer and Information Science, vol 2017. Springer, Singapore. https://doi.org/10.1007/978-981-97-0837-6_15

Download citation

DOI: https://doi.org/10.1007/978-981-97-0837-6_15
Published: 22 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0836-9
Online ISBN: 978-981-97-0837-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles