Deep reinforcement learning for the dynamic and uncertain vehicle routing problem

Pan, Weixu; Liu, Shi Qiang

doi:10.1007/s10489-022-03456-w

Deep reinforcement learning for the dynamic and uncertain vehicle routing problem

Published: 18 April 2022

Volume 53, pages 405–422, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

4607 Accesses
22 Citations
Explore all metrics

Abstract

Accurate and real-time tracking for real-world urban logistics has become a popular research topic in the field of intelligent transportation. While the routing of urban logistic service is usually accomplished via complex mathematical and analytical methods. However, the nature and scope of real-world urban logistics are highly dynamic, and the existing optimization technique cannot precisely formulate the dynamic characteristics of the route. To ensure customers’ demands are met, planners need to respond to these changes quickly (sometimes instantaneously). This paper proposes the formulation of a novel deep reinforcement learning framework to solve a dynamic and uncertain vehicle routing problem (DU-VRP), whose objective is to meet the uncertain servicing needs of customers in a dynamic environment. Considering uncertain information about the demands of customers in this problem, the partial observation Markov decision process is designed to frequently observe the changes in customers’ demands in a real-time decision support system that consists of a deep neural network with a dynamic attention mechanism. Besides, a cutting-edge reinforcement learning algorithm is presented to control the value function of the DU-VRP for better training the routing process’s dynamics and uncertainty. Computational experiments are conducted considering different data sources to obtain satisfactory solutions of the DU-VRP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vehicle Routing Problem Using Reinforcement Learning: Recent Advancements

Deep Reinforcement Learning with Two-Stage Training Strategy for Practical Electric Vehicle Routing Problem with Time Windows

A Review of Vehicle Routing Problem Based on RL and DRL

References

Steever Z, Karwan M, Murray C (2019) Dynamic courier routing for a food delivery service. Comput Oper Res 107:173–188. https://doi.org/10.1016/j.cor.2019.03.008
Article MathSciNet MATH Google Scholar
Drent C, Keizer MO, Houtum GJ van (2020) Dynamic dispatching and repositioning policies for fast-response service networks. Eur J Oper Res 285:583–598. https://doi.org/10.1016/j.ejor.2020.02.014
Article MathSciNet MATH Google Scholar
Hong J, Lee M, Cheong T, Lee HC (2019) Routing for an on-demand logistics service. Transp Res Part C: Emerg Technol 103: 328–351. https://doi.org/10.1016/j.trc.2018.12.010
Article Google Scholar
Zhang Z, Sun Y, Xie H, Teng Y, Wang J (2018) Gmma: Gpu-based multiobjective memetic algorithms for vehicle routing problem with route balancing. Applied Intelligence 49:63–78. https://doi.org/10.1007/s10489-018-1210-6
Article Google Scholar
Vidal T, Laporte G, Matl P (2020) A concise guide to existing and emerging vehicle routing problem variants. Eur J Oper Res 286:401–416. https://doi.org/10.1016/j.ejor.2019.10.010
Article MathSciNet MATH Google Scholar
Nasri M, Metrane A, Hafidi I, Jamali A (2020) A robust approach for solving a vehicle routing problem with time windows with uncertain service and travel times. Int J Ind Eng Comput 11:1–16. https://doi.org/10.5267/j.ijiec.2019.7.002
Article Google Scholar
Mazyavkina N, Sviridov S, Ivanov S, Burnaev E. (2021) Reinforcement learning for combinatorial optimization: A survey. Comput Oper Res 134:105400. https://doi.org/10.1016/j.cor.2021.105400
Article MathSciNet MATH Google Scholar
Karimi-Mamaghan M, Mohammadi M, Meyer P, Karimi-Mamaghan AM, Talbi E-G (2022) Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-art. Eur J Oper Res 296:393–422. https://doi.org/10.1016/j.ejor.2021.04.032
Article MathSciNet MATH Google Scholar
Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst 233:107526. https://doi.org/10.1016/j.knosys.2021.107526
Article Google Scholar
Parvez Farazi N, Zou B, Ahamed T, Barua L (2021) Deep reinforcement learning in transportation research: a review. Transp Res Interdiscip Perspect 11:100425. https://doi.org/10.1016/j.trip.2021.100425
Article Google Scholar
Ulmer MW, Thomas BW (2020) Meso-parametric value function approximation for dynamic customer acceptances in delivery routing. Eur J Oper Res 285:183–195. https://doi.org/10.1016/j.ejor.2019.04.029
Article MathSciNet MATH Google Scholar
Ning C, You F (2019) Optimization under uncertainty in the era of big data and deep learning: when machine learning meets mathematical programming. Comput Chem Eng 125:434–448. https://doi.org/10.1016/j.compchemeng.2019.03.034
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning part II an introduction, 2nd edition. MIT press, Cambridge
MATH Google Scholar
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588:604–609. https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Hubmann C, Schulz J, Becker M, Althoff D, Stiller C (2018) Automated driving in uncertain environments: planning with interaction and uncertain maneuver prediction. IEEE Trans Intell Veh 3:5–17. https://doi.org/10.1109/TIV.2017.2788208
Article Google Scholar
Pouya P, Madni AM (2021) Expandable-partially observable Markov decision-process framework for modeling and analysis of autonomous vehicle behavior. IEEE Syst J 15:3714–3725. https://doi.org/10.1109/JSYST.2020.3010473
Article Google Scholar
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290:405–421. https://doi.org/10.1016/j.ejor.2020.07.063
Article MathSciNet MATH Google Scholar
Bui Khac Hoai N, Cho J, Yi H (2021) Spatial-temporal graph neural network for traffic forecasting: an overview and open research issues. Applied Intelligence 52:2763–2774. https://doi.org/10.1007/s10489-021-02587-w
Article Google Scholar
Vesselinova N, Steinert R, Perez-Ramirez DF, Boman M (2020) Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8:120388–120416. https://doi.org/10.1109/ACCESS.2020.3004964
Article Google Scholar
Sun P, Hu Y, Lan J, Tian L, Chen M (2019) Tide: time-relevant deep reinforcement learning for routing optimization. Futur Gener Comput Syst 99:401–409. https://doi.org/10.1016/j.future.2019.04.014
Article Google Scholar
Huynh TT, Lin CM, Lee K, The Vu M, Nguyen N, Chao F (2021) Intelligent wavelet fuzzy brain emotional controller using dual function-link network for uncertain nonlinear control systems. Applied Intelligence 52:2720–2744. https://doi.org/10.1007/s10489-021-02482-4
Article Google Scholar
Xu R, Li M, Yang Z, Yang L, Qiao K, Shang Z (2021) Dynamic feature selection algorithm based on q-learning mechanism. Appl Intell 51:1–12. https://doi.org/10.1007/s10489-021-02257-x
Article Google Scholar
Wang Q (2021) Varl: a variational autoencoder-based reinforcement learning framework for vehicle routing problems. Appl Intell. https://doi.org/10.1007/s10489-021-02920-3
Chen L, Cui J, Tang X, Qian Y, Li Y, Zhang Y (2021) Rlpath: a knowledge graph link prediction method using reinforcement learning based attentive relation path searching and representation learning. Applied Intelligence 52:4715–4726. https://doi.org/10.1007/s10489-021-02672-0
Article Google Scholar
Zhang K, He F, Zhang Z, Lin X, Li M (2020) Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach. Transportation Research Part C: Emerging Technologies 121:102861. https://doi.org/10.1016/j.trc.2020.102861
Article Google Scholar
Nazari M, Oroojlooy A, Snyder LV, Takáč M (2018) Reinforcement learning for solving the vehicle routing problem. In: Proceedings of the 32nd international conference on neural information processing systems (NeurIPS 2018). Montréal, pp 9861–9871. https://dl.acm.org/doi/10.5555/3327546.3327651https://dl.acm.org/doi/10.5555/3327546.3327651
Zhao J, Mao M, Zhao X, Zou J (2021) A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans Intell Transp Syst 22:7208–7218. https://doi.org/10.1109/TITS.2020.3003163
Article Google Scholar
Dai H, Khalil EB, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. Adv Neural Inf Process Syst. Red Hook, pp 6349–6359. https://dl.acm.org/doi/10.5555/3295222.3295382https://dl.acm.org/doi/ https://dl.acm.org/doi/10.5555/3295222.329538210.5555/3295222.3295382
Csy A, Ahfc B, Ksc A (2020) An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand. Transp Res B Methodol 140 :210–235. https://doi.org/10.1016/j.trb.2020.08.005
Article Google Scholar
Kullman ND, Cousineau M, Goodson JC, Mendoza JE (2021) Dynamic ride-hailing with electric vehicles. Transp Sci 1–20. https://doi.org/10.1287/trsc.2021.1042
Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2020) On modeling stochastic dynamic vehicle routing problems. EURO J Transp Logist 9:100008. https://doi.org/10.1016/j.ejtl.2020.10000810.1016/j.ejtl.2020.100008
Article Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning ICML 2016. New York, pp 1928–1937. https://dl.acm.org/doi/10.5555/3045390.3045594
Archetti C, Feillet D, Mor A, Speranza MG (2020) Dynamic traveling salesman problem with stochastic release dates. Eur J Oper Res 280:832–844. https://doi.org/10.1016/j.ejor.2019.07.062
Article MathSciNet MATH Google Scholar
Qiu H, Wang S, Yin Y, Wang D, Wang Y (2022) A deep reinforcement learning-based approach for the home delivery and installation routing problem. Int J Prod Econ 244:108362. https://doi.org/10.1016/j.ijpe.2021.108362
Article Google Scholar
Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806–3817. https://doi.org/10.1109/TITS.2019.2909109
Article Google Scholar
Ahamed T, Zou B, Farazi NP, Tulabandhula T (2021) Deep reinforcement learning for crowdsourced urban delivery. Transp Res B Methodol 152:227–257. https://doi.org/10.1016/j.trb.2021.08.01510.1016/j.trb.2021.08.015
Article Google Scholar
Silva ML, Souza SD, Souza MF, Bazzan A (2019) A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems. Exp Syst Applic 131:148–171. https://doi.org/10.1016/j.eswa.2019.04.05610.1016/j.eswa.2019.04.056
Article Google Scholar
Kang Y, Lee S, Chung BD (2019) Learning-based logistics planning and scheduling for crowdsourced parcel delivery. Comput Industr Eng 132:271–279. https://doi.org/10.1016/j.cie.2019.04.04410.1016/j.cie.2019.04.044
Article Google Scholar
Zou G, Tang J, Yilmaz L, Kong X (2021) Online food ordering delivery strategies based on deep reinforcement learning. Appl Intell. https://doi.org/10.1007/s10489-021-02750-310.1007/s10489-021-02750-3
Wang Z, Qin Z, Tang X, Ye J, Zhu H (2018) Deep reinforcement learning with knowledge transfer for online rides order dispatching. In: Proceedings IEEE international conference data mining. ICDM, pp 617–626, DOI https://doi.org/10.1109/ICDM.2018.00077, (to appear in print)
Liang E, Wen K, Lam WHK, Sumalee A, Zhong R (2021) An integrated reinforcement learning and centralized programming approach for online taxi dispatching. IEEE Trans Neural Networks Learn Syst. https://doi.org/10.1109/TNNLS.2021.3060187
Turan B, Pedarsani R, Alizadeh M (2020) Dynamic pricing and fleet management for electric autonomous mobility on demand systems. Transp Res Part C: Emerg Technol 121:102829. https://doi.org/10.1016/j.trc.2020.102829
Article Google Scholar
Chen X, Ulmer MW, Thomas BW (2022) Deep q-learning for same-day delivery with vehicles and drones. Eur J Oper Res 298 :939–952. https://doi.org/10.1016/j.ejor.2021.06.021
Article MathSciNet MATH Google Scholar
Liu Z, Li J, Wu K (2020) Context-aware taxi dispatching at city-scale using deep reinforcement learning. IEEE Trans Intell Transp Syst 23:1–14. https://doi.org/10.1109/TITS.2020.3030252
Article Google Scholar
Liu X, Zhang D, Zhang T, Cui Y, Chen L, Liu S (2021) Novel best path selection approach based on hybrid improved a* algorithm and reinforcement learning. Applied Intelligence 51:9015–9029. https://doi.org/10.1007/s10489-021-02303-8
Article Google Scholar
Tang X, Li M, Lin X, He F (2020) Online operations of automated electric taxi fleets: an advisor-student reinforcement learning framework. Transp Res Part C Emerg Technol 121:102844. https://doi.org/10.1016/j.trc.2020.102844
Article Google Scholar
Koh S, Zhou B, Fang H, Yang P, Ji Z (2020) Real-time deep reinforcement learning based vehicle routing and navigation. Appl Soft Comput 96:106694. https://doi.org/10.1016/j.asoc.2020.106694
Article Google Scholar
Mao C, Liu Y, Shen Z (2020) Dispatch of autonomous vehicles for taxi services: a deep reinforcement learning approach. Transportation Research Part C Emerging Technologies 115:102626. https://doi.org/10.1016/j.trc.2020.10262610.1016/j.trc.2020.102626
Article Google Scholar
Al-Abbasi AO, Ghosh A, Aggarwal V (2019) Deeppool: distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Trans Intell Transp Syst 20:4714–4727. https://doi.org/10.1109/TITS.2019.293183010.1109/TITS.2019.2931830
Article Google Scholar
Basso R, Kulcsár B, Sanchez-Diaz I, Qu X (2022) Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transp Res Part E: Logist Transp Rev 157:102496. https://doi.org/10.1016/j.tre.2021.102496
Article Google Scholar
Guo P, Xiao K, Ye Z, Zhu W (2021) Route optimization via environment-aware deep network and reinforcement learning. ACM Transactions on Intelligent System and Technology 12:1–21. https://doi.org/10.1145/346164510.1145/3461645
Article Google Scholar
Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305. https://doi.org/10.1109/TIV.2019.295590510.1109/TIV.2019.2955905
Article Google Scholar
Nguyen Q, Vien N, Dang V. -H., Chung T (2020) Asynchronous framework with reptile+ algorithm to meta learn partially observable Markov decision process. Appl Intell 50:4050–4062. https://doi.org/10.1007/s10489-020-01748-7
Article Google Scholar
Wu X, Du Z, Guo Y, Fujita H (2019) Hierarchical attention based long short-term memory for chinese lyric generation. Appl Intell 49:44–52. https://doi.org/10.1007/s10489-018-1206-2
Article Google Scholar
Boeing G (2017) Osmnx: new methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput Environ Urban Syst 65:126–139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004
Article Google Scholar
Melinte O, Vladareanu L (2020) Facial expressions recognition for human-robot interaction using deep convolutional neural networks with rectified adam optimizer. Sensors 20:2393. https://doi.org/10.3390/s20082393
Article Google Scholar
Lesch V, König M, Kounev S et al (2022) Tackling the rich vehicle routing problem with nature-inspired algorithms. Appl Intell. https://doi.org/10.1007/s10489-021-03035-5
Zhang Q, Liu SQ, Masoud M (2020) A traffic congestion analysis by user equilibrium and system optimum with incomplete information. J Comb Optim. In press. https://doi.org/10.1007/s10878-020-00663-4
Liu CL, Chang CC, Tseng CJ (2020) Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 8:71752–71762. https://doi.org/10.1109/ACCESS.2020.2987820
Article Google Scholar
Liu SQ, Kozan E (2016) Parallel-identical-machine job-shop scheduling with different stage-dependent buffering requirements. Comput Oper Res 74:31–41. https://doi.org/10.1016/j.cor.2016.04.023
Article MathSciNet MATH Google Scholar
Kozan E, Liu SQ (2017) An operational-level multi-stage mine production timetabling model for optimally synchronising drilling, blasting and excavating operations. Int J Mining, Reclam Environ 31:457–474. https://doi.org/10.1080/17480930.2016.1160818
Article Google Scholar
Liu SQ, Kozan E (2019) Integration of mathematical models for ore mining industry. Int J Syst Sci Oper Logist 6:55–68. https://doi.org/10.1080/23302674.2017.1344330
Article Google Scholar
Zeng L, Liu SQ, Kozan E et al (2021) A comprehensive interdisciplinary review of mine supply chain management. Resour Policy 74:102274. https://doi.org/10.1016/j.resourpol.2021.102274
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 71871064.

Author information

Authors and Affiliations

School of Economics and Management, Fuzhou University, Fuzhou, 350108, Fujian, China
Weixu Pan & Shi Qiang Liu

Authors

Weixu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Shi Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi Qiang Liu.

Ethics declarations

Conflict of Interests

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Weixu Pan and Shi Qiang Liu contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, W., Liu, S. Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl Intell 53, 405–422 (2023). https://doi.org/10.1007/s10489-022-03456-w

Download citation

Accepted: 01 March 2022
Published: 18 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03456-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep reinforcement learning for the dynamic and uncertain vehicle routing problem

Abstract

Access this article

Similar content being viewed by others

Vehicle Routing Problem Using Reinforcement Learning: Recent Advancements

Deep Reinforcement Learning with Two-Stage Training Strategy for Practical Electric Vehicle Routing Problem with Time Windows

A Review of Vehicle Routing Problem Based on RL and DRL

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep reinforcement learning for the dynamic and uncertain vehicle routing problem

Abstract

Access this article

Similar content being viewed by others

Vehicle Routing Problem Using Reinforcement Learning: Recent Advancements

Deep Reinforcement Learning with Two-Stage Training Strategy for Practical Electric Vehicle Routing Problem with Time Windows

A Review of Vehicle Routing Problem Based on RL and DRL

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation