Skip to main content

Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2017))

Included in the following conference series:

  • 97 Accesses

Abstract

With a surge demand for instant gratification in online-shopping, offering same-day delivery with heterogeneous fleets of drones and vehicles provides new insights for decision makers. However, decisions in real-time involving assignment and routing of vehicles and drones suffer “curse of dimensionality”, due to stochastic and dynamic orders, huge state spaces as well as associated and diverse decisions. In this paper, a deep reinforcement learning (DRL) based approach is presented to handle this dynamic decision problem. First, a routed-based Markov decision process is formulated to model the problem. Besides, a DRL-based algorithm combining proximal policy optimization and heuristics (PPOh) is developed to decide whether to accept customer requests, how to assign orders and plan routes of fleets. Evaluation on extensive computational experiments shows that PPOh outperforms the extant methods and evidently improves service rates of fleets under the same workload.

Supported by National Natural Science Foundation of China (Grants No. U213320067).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azi, N., Gendreau, M., Potvin, J.Y.: A dynamic vehicle routing problem with multiple delivery routes. Ann. Oper. Res. 199, 103–112 (2012)

    Article  MathSciNet  Google Scholar 

  2. Chen, J., Li, S.E., Tomizuka, M.: Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 23(6), 5068–5078 (2021)

    Article  Google Scholar 

  3. Chen, X., Ulmer, M.W., Thomas, B.W.: Deep Q-learning for same-day delivery with vehicles and drones. Eur. J. Oper. Res. 298(3), 939–952 (2022)

    Article  MathSciNet  Google Scholar 

  4. Ferrucci, F., Bock, S.: Pro-active real-time routing in applications with multiple request patterns. Eur. J. Oper. Res. 253(2), 356–371 (2016)

    Article  MathSciNet  Google Scholar 

  5. Ferrucci, F., Bock, S., Gendreau, M.: A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur. J. Oper. Res. 225(1), 130–141 (2013)

    Article  Google Scholar 

  6. Joe, W., Lau, H.C.: Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 394–402 (2020)

    Google Scholar 

  7. Klapp, M.A., Erera, A.L., Toriello, A.: The dynamic dispatch waves problem for same-day delivery. Eur. J. Oper. Res. 271(2), 519–534 (2018)

    Article  MathSciNet  Google Scholar 

  8. Klapp, M.A., Erera, A.L., Toriello, A.: The one-dimensional dynamic dispatch waves problem. Transp. Sci. 52(2), 402–415 (2018)

    Article  Google Scholar 

  9. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12 (1999)

    Google Scholar 

  10. Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475 (2018)

  11. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  12. Ma, Y., et al.: Learning to iteratively solve routing problems with dual-aspect collaborative transformer. In: Advances in Neural Information Processing Systems, vol. 34, pp. 11096–11107 (2021)

    Google Scholar 

  13. Mazyavkina, N., Sviridov, S., Ivanov, S., Burnaev, E.: Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021)

    Article  MathSciNet  Google Scholar 

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  16. Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103 (2000)

    Google Scholar 

  17. Ulmer, M.W., Goodson, J.C., Mattfeld, D.C., Thomas, B.W.: On modeling stochastic dynamic vehicle routing problems. EURO J. Transp. Logist. 9(2), 100008 (2020)

    Article  Google Scholar 

  18. Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp. Sci. 52(1), 20–37 (2018)

    Article  Google Scholar 

  19. Ulmer, M.W., Thomas, B.W.: Same-day delivery with heterogeneous fleets of drones and vehicles. Networks 72(4), 475–505 (2018)

    Article  MathSciNet  Google Scholar 

  20. Van Heeswijk, W.J., Mes, M.R., Schutten, J.M.: The delivery dispatching problem with time windows for urban consolidation centers. Transp. Sci. 53(1), 203–221 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M., Cai, K., Zhao, P. (2024). Proximal Policy Optimization for Same-Day Delivery with Drones and Vehicles. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2023. Communications in Computer and Information Science, vol 2017. Springer, Singapore. https://doi.org/10.1007/978-981-97-0837-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0837-6_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0836-9

  • Online ISBN: 978-981-97-0837-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics