Skip to main content
Log in

Deep reinforcement learning for the dynamic and uncertain vehicle routing problem

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Accurate and real-time tracking for real-world urban logistics has become a popular research topic in the field of intelligent transportation. While the routing of urban logistic service is usually accomplished via complex mathematical and analytical methods. However, the nature and scope of real-world urban logistics are highly dynamic, and the existing optimization technique cannot precisely formulate the dynamic characteristics of the route. To ensure customers’ demands are met, planners need to respond to these changes quickly (sometimes instantaneously). This paper proposes the formulation of a novel deep reinforcement learning framework to solve a dynamic and uncertain vehicle routing problem (DU-VRP), whose objective is to meet the uncertain servicing needs of customers in a dynamic environment. Considering uncertain information about the demands of customers in this problem, the partial observation Markov decision process is designed to frequently observe the changes in customers’ demands in a real-time decision support system that consists of a deep neural network with a dynamic attention mechanism. Besides, a cutting-edge reinforcement learning algorithm is presented to control the value function of the DU-VRP for better training the routing process’s dynamics and uncertainty. Computational experiments are conducted considering different data sources to obtain satisfactory solutions of the DU-VRP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Steever Z, Karwan M, Murray C (2019) Dynamic courier routing for a food delivery service. Comput Oper Res 107:173–188. https://doi.org/10.1016/j.cor.2019.03.008

    Article  MathSciNet  MATH  Google Scholar 

  2. Drent C, Keizer MO, Houtum GJ van (2020) Dynamic dispatching and repositioning policies for fast-response service networks. Eur J Oper Res 285:583–598. https://doi.org/10.1016/j.ejor.2020.02.014

    Article  MathSciNet  MATH  Google Scholar 

  3. Hong J, Lee M, Cheong T, Lee HC (2019) Routing for an on-demand logistics service. Transp Res Part C: Emerg Technol 103: 328–351. https://doi.org/10.1016/j.trc.2018.12.010

    Article  Google Scholar 

  4. Zhang Z, Sun Y, Xie H, Teng Y, Wang J (2018) Gmma: Gpu-based multiobjective memetic algorithms for vehicle routing problem with route balancing. Applied Intelligence 49:63–78. https://doi.org/10.1007/s10489-018-1210-6

    Article  Google Scholar 

  5. Vidal T, Laporte G, Matl P (2020) A concise guide to existing and emerging vehicle routing problem variants. Eur J Oper Res 286:401–416. https://doi.org/10.1016/j.ejor.2019.10.010

    Article  MathSciNet  MATH  Google Scholar 

  6. Nasri M, Metrane A, Hafidi I, Jamali A (2020) A robust approach for solving a vehicle routing problem with time windows with uncertain service and travel times. Int J Ind Eng Comput 11:1–16. https://doi.org/10.5267/j.ijiec.2019.7.002

    Article  Google Scholar 

  7. Mazyavkina N, Sviridov S, Ivanov S, Burnaev E. (2021) Reinforcement learning for combinatorial optimization: A survey. Comput Oper Res 134:105400. https://doi.org/10.1016/j.cor.2021.105400

    Article  MathSciNet  MATH  Google Scholar 

  8. Karimi-Mamaghan M, Mohammadi M, Meyer P, Karimi-Mamaghan AM, Talbi E-G (2022) Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-art. Eur J Oper Res 296:393–422. https://doi.org/10.1016/j.ejor.2021.04.032

    Article  MathSciNet  MATH  Google Scholar 

  9. Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst 233:107526. https://doi.org/10.1016/j.knosys.2021.107526

    Article  Google Scholar 

  10. Parvez Farazi N, Zou B, Ahamed T, Barua L (2021) Deep reinforcement learning in transportation research: a review. Transp Res Interdiscip Perspect 11:100425. https://doi.org/10.1016/j.trip.2021.100425

    Article  Google Scholar 

  11. Ulmer MW, Thomas BW (2020) Meso-parametric value function approximation for dynamic customer acceptances in delivery routing. Eur J Oper Res 285:183–195. https://doi.org/10.1016/j.ejor.2019.04.029

    Article  MathSciNet  MATH  Google Scholar 

  12. Ning C, You F (2019) Optimization under uncertainty in the era of big data and deep learning: when machine learning meets mathematical programming. Comput Chem Eng 125:434–448. https://doi.org/10.1016/j.compchemeng.2019.03.034

    Article  Google Scholar 

  13. Sutton RS, Barto AG (2018) Reinforcement learning part II an introduction, 2nd edition. MIT press, Cambridge

    MATH  Google Scholar 

  14. Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588:604–609. https://doi.org/10.1038/s41586-020-03051-4

    Article  Google Scholar 

  15. Hubmann C, Schulz J, Becker M, Althoff D, Stiller C (2018) Automated driving in uncertain environments: planning with interaction and uncertain maneuver prediction. IEEE Trans Intell Veh 3:5–17. https://doi.org/10.1109/TIV.2017.2788208

    Article  Google Scholar 

  16. Pouya P, Madni AM (2021) Expandable-partially observable Markov decision-process framework for modeling and analysis of autonomous vehicle behavior. IEEE Syst J 15:3714–3725. https://doi.org/10.1109/JSYST.2020.3010473

    Article  Google Scholar 

  17. Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290:405–421. https://doi.org/10.1016/j.ejor.2020.07.063

    Article  MathSciNet  MATH  Google Scholar 

  18. Bui Khac Hoai N, Cho J, Yi H (2021) Spatial-temporal graph neural network for traffic forecasting: an overview and open research issues. Applied Intelligence 52:2763–2774. https://doi.org/10.1007/s10489-021-02587-w

    Article  Google Scholar 

  19. Vesselinova N, Steinert R, Perez-Ramirez DF, Boman M (2020) Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8:120388–120416. https://doi.org/10.1109/ACCESS.2020.3004964

    Article  Google Scholar 

  20. Sun P, Hu Y, Lan J, Tian L, Chen M (2019) Tide: time-relevant deep reinforcement learning for routing optimization. Futur Gener Comput Syst 99:401–409. https://doi.org/10.1016/j.future.2019.04.014

    Article  Google Scholar 

  21. Huynh TT, Lin CM, Lee K, The Vu M, Nguyen N, Chao F (2021) Intelligent wavelet fuzzy brain emotional controller using dual function-link network for uncertain nonlinear control systems. Applied Intelligence 52:2720–2744. https://doi.org/10.1007/s10489-021-02482-4

    Article  Google Scholar 

  22. Xu R, Li M, Yang Z, Yang L, Qiao K, Shang Z (2021) Dynamic feature selection algorithm based on q-learning mechanism. Appl Intell 51:1–12. https://doi.org/10.1007/s10489-021-02257-x

    Article  Google Scholar 

  23. Wang Q (2021) Varl: a variational autoencoder-based reinforcement learning framework for vehicle routing problems. Appl Intell. https://doi.org/10.1007/s10489-021-02920-3

  24. Chen L, Cui J, Tang X, Qian Y, Li Y, Zhang Y (2021) Rlpath: a knowledge graph link prediction method using reinforcement learning based attentive relation path searching and representation learning. Applied Intelligence 52:4715–4726. https://doi.org/10.1007/s10489-021-02672-0

    Article  Google Scholar 

  25. Zhang K, He F, Zhang Z, Lin X, Li M (2020) Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach. Transportation Research Part C: Emerging Technologies 121:102861. https://doi.org/10.1016/j.trc.2020.102861

    Article  Google Scholar 

  26. Nazari M, Oroojlooy A, Snyder LV, Takáč M (2018) Reinforcement learning for solving the vehicle routing problem. In: Proceedings of the 32nd international conference on neural information processing systems (NeurIPS 2018). Montréal, pp 9861–9871. https://dl.acm.org/doi/10.5555/3327546.3327651https://dl.acm.org/doi/10.5555/3327546.3327651

  27. Zhao J, Mao M, Zhao X, Zou J (2021) A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans Intell Transp Syst 22:7208–7218. https://doi.org/10.1109/TITS.2020.3003163

    Article  Google Scholar 

  28. Dai H, Khalil EB, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. Adv Neural Inf Process Syst. Red Hook, pp 6349–6359. https://dl.acm.org/doi/10.5555/3295222.3295382https://dl.acm.org/doi/ https://dl.acm.org/doi/10.5555/3295222.329538210.5555/3295222.3295382

  29. Csy A, Ahfc B, Ksc A (2020) An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand. Transp Res B Methodol 140 :210–235. https://doi.org/10.1016/j.trb.2020.08.005

    Article  Google Scholar 

  30. Kullman ND, Cousineau M, Goodson JC, Mendoza JE (2021) Dynamic ride-hailing with electric vehicles. Transp Sci 1–20. https://doi.org/10.1287/trsc.2021.1042

  31. Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2020) On modeling stochastic dynamic vehicle routing problems. EURO J Transp Logist 9:100008. https://doi.org/10.1016/j.ejtl.2020.10000810.1016/j.ejtl.2020.100008

    Article  Google Scholar 

  32. Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning ICML 2016. New York, pp 1928–1937. https://dl.acm.org/doi/10.5555/3045390.3045594

  33. Archetti C, Feillet D, Mor A, Speranza MG (2020) Dynamic traveling salesman problem with stochastic release dates. Eur J Oper Res 280:832–844. https://doi.org/10.1016/j.ejor.2019.07.062

    Article  MathSciNet  MATH  Google Scholar 

  34. Qiu H, Wang S, Yin Y, Wang D, Wang Y (2022) A deep reinforcement learning-based approach for the home delivery and installation routing problem. Int J Prod Econ 244:108362. https://doi.org/10.1016/j.ijpe.2021.108362

    Article  Google Scholar 

  35. Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806–3817. https://doi.org/10.1109/TITS.2019.2909109

    Article  Google Scholar 

  36. Ahamed T, Zou B, Farazi NP, Tulabandhula T (2021) Deep reinforcement learning for crowdsourced urban delivery. Transp Res B Methodol 152:227–257. https://doi.org/10.1016/j.trb.2021.08.01510.1016/j.trb.2021.08.015

    Article  Google Scholar 

  37. Silva ML, Souza SD, Souza MF, Bazzan A (2019) A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems. Exp Syst Applic 131:148–171. https://doi.org/10.1016/j.eswa.2019.04.05610.1016/j.eswa.2019.04.056

    Article  Google Scholar 

  38. Kang Y, Lee S, Chung BD (2019) Learning-based logistics planning and scheduling for crowdsourced parcel delivery. Comput Industr Eng 132:271–279. https://doi.org/10.1016/j.cie.2019.04.04410.1016/j.cie.2019.04.044

    Article  Google Scholar 

  39. Zou G, Tang J, Yilmaz L, Kong X (2021) Online food ordering delivery strategies based on deep reinforcement learning. Appl Intell. https://doi.org/10.1007/s10489-021-02750-310.1007/s10489-021-02750-3

  40. Wang Z, Qin Z, Tang X, Ye J, Zhu H (2018) Deep reinforcement learning with knowledge transfer for online rides order dispatching. In: Proceedings IEEE international conference data mining. ICDM, pp 617–626, DOI https://doi.org/10.1109/ICDM.2018.00077, (to appear in print)

  41. Liang E, Wen K, Lam WHK, Sumalee A, Zhong R (2021) An integrated reinforcement learning and centralized programming approach for online taxi dispatching. IEEE Trans Neural Networks Learn Syst. https://doi.org/10.1109/TNNLS.2021.3060187

  42. Turan B, Pedarsani R, Alizadeh M (2020) Dynamic pricing and fleet management for electric autonomous mobility on demand systems. Transp Res Part C: Emerg Technol 121:102829. https://doi.org/10.1016/j.trc.2020.102829

    Article  Google Scholar 

  43. Chen X, Ulmer MW, Thomas BW (2022) Deep q-learning for same-day delivery with vehicles and drones. Eur J Oper Res 298 :939–952. https://doi.org/10.1016/j.ejor.2021.06.021

    Article  MathSciNet  MATH  Google Scholar 

  44. Liu Z, Li J, Wu K (2020) Context-aware taxi dispatching at city-scale using deep reinforcement learning. IEEE Trans Intell Transp Syst 23:1–14. https://doi.org/10.1109/TITS.2020.3030252

    Article  Google Scholar 

  45. Liu X, Zhang D, Zhang T, Cui Y, Chen L, Liu S (2021) Novel best path selection approach based on hybrid improved a* algorithm and reinforcement learning. Applied Intelligence 51:9015–9029. https://doi.org/10.1007/s10489-021-02303-8

    Article  Google Scholar 

  46. Tang X, Li M, Lin X, He F (2020) Online operations of automated electric taxi fleets: an advisor-student reinforcement learning framework. Transp Res Part C Emerg Technol 121:102844. https://doi.org/10.1016/j.trc.2020.102844

    Article  Google Scholar 

  47. Koh S, Zhou B, Fang H, Yang P, Ji Z (2020) Real-time deep reinforcement learning based vehicle routing and navigation. Appl Soft Comput 96:106694. https://doi.org/10.1016/j.asoc.2020.106694

    Article  Google Scholar 

  48. Mao C, Liu Y, Shen Z (2020) Dispatch of autonomous vehicles for taxi services: a deep reinforcement learning approach. Transportation Research Part C Emerging Technologies 115:102626. https://doi.org/10.1016/j.trc.2020.10262610.1016/j.trc.2020.102626

    Article  Google Scholar 

  49. Al-Abbasi AO, Ghosh A, Aggarwal V (2019) Deeppool: distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Trans Intell Transp Syst 20:4714–4727. https://doi.org/10.1109/TITS.2019.293183010.1109/TITS.2019.2931830

    Article  Google Scholar 

  50. Basso R, Kulcsár B, Sanchez-Diaz I, Qu X (2022) Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transp Res Part E: Logist Transp Rev 157:102496. https://doi.org/10.1016/j.tre.2021.102496

    Article  Google Scholar 

  51. Guo P, Xiao K, Ye Z, Zhu W (2021) Route optimization via environment-aware deep network and reinforcement learning. ACM Transactions on Intelligent System and Technology 12:1–21. https://doi.org/10.1145/346164510.1145/3461645

    Article  Google Scholar 

  52. Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305. https://doi.org/10.1109/TIV.2019.295590510.1109/TIV.2019.2955905

    Article  Google Scholar 

  53. Nguyen Q, Vien N, Dang V. -H., Chung T (2020) Asynchronous framework with reptile+ algorithm to meta learn partially observable Markov decision process. Appl Intell 50:4050–4062. https://doi.org/10.1007/s10489-020-01748-7

    Article  Google Scholar 

  54. Wu X, Du Z, Guo Y, Fujita H (2019) Hierarchical attention based long short-term memory for chinese lyric generation. Appl Intell 49:44–52. https://doi.org/10.1007/s10489-018-1206-2

    Article  Google Scholar 

  55. Boeing G (2017) Osmnx: new methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput Environ Urban Syst 65:126–139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004

    Article  Google Scholar 

  56. Melinte O, Vladareanu L (2020) Facial expressions recognition for human-robot interaction using deep convolutional neural networks with rectified adam optimizer. Sensors 20:2393. https://doi.org/10.3390/s20082393

    Article  Google Scholar 

  57. Lesch V, König M, Kounev S et al (2022) Tackling the rich vehicle routing problem with nature-inspired algorithms. Appl Intell. https://doi.org/10.1007/s10489-021-03035-5

  58. Zhang Q, Liu SQ, Masoud M (2020) A traffic congestion analysis by user equilibrium and system optimum with incomplete information. J Comb Optim. In press. https://doi.org/10.1007/s10878-020-00663-4

  59. Liu CL, Chang CC, Tseng CJ (2020) Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 8:71752–71762. https://doi.org/10.1109/ACCESS.2020.2987820

    Article  Google Scholar 

  60. Liu SQ, Kozan E (2016) Parallel-identical-machine job-shop scheduling with different stage-dependent buffering requirements. Comput Oper Res 74:31–41. https://doi.org/10.1016/j.cor.2016.04.023

    Article  MathSciNet  MATH  Google Scholar 

  61. Kozan E, Liu SQ (2017) An operational-level multi-stage mine production timetabling model for optimally synchronising drilling, blasting and excavating operations. Int J Mining, Reclam Environ 31:457–474. https://doi.org/10.1080/17480930.2016.1160818

    Article  Google Scholar 

  62. Liu SQ, Kozan E (2019) Integration of mathematical models for ore mining industry. Int J Syst Sci Oper Logist 6:55–68. https://doi.org/10.1080/23302674.2017.1344330

    Article  Google Scholar 

  63. Zeng L, Liu SQ, Kozan E et al (2021) A comprehensive interdisciplinary review of mine supply chain management. Resour Policy 74:102274. https://doi.org/10.1016/j.resourpol.2021.102274

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 71871064.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Qiang Liu.

Ethics declarations

Conflict of Interests

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Weixu Pan and Shi Qiang Liu contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, W., Liu, S. Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl Intell 53, 405–422 (2023). https://doi.org/10.1007/s10489-022-03456-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03456-w

Keywords

Navigation