Abstract
With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast volume of transactions and the computational complexity of delivery routes pose significant challenges to the efficient dispatching of orders. The action to dispatch orders and the resulting state transition of couriers form a Markov decision process (MDP). The reinforcement learning technique had proven its capability of dealing with MDP. This paper proposes a Double Deep Q Netwok (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Altabeeb AM, Mohsen AM, Ghallab A (2019) An improved hybrid firefly algorithm for capacitated vehicle routing problem. Appl Soft Comput 84:105728
Behrisch M, Bieker L, Erdmann J, Krajzewicz D (2011) Sumo - simulation of urban mobility: An overview. In: The third international conference on advances in system simulation
Chen S-A, Tangkaratt V, Lin H-T, Sugiyama M (2020) Active deep q-learning with demonstration. Mach Learn 109:1699–1725
Mogale DG, Mukesh KS, Krishna K, Manoj KT (2019) Grain silo location-allocation problem with dwell time for optimization of food grain supply chain network. Transp Res Part E Logist Transp Rev 111:40–69
Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Manag Sci
Elshaer R, Awad H (2020) A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants. Comput Ind Eng 140:106242
Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: Demand forecasting and price optimization. Manuf Serv Oper Manag 18(1):69–88
Goel R, Maini R (2018) A hybrid of ant colony and firefly algorithms (hafa) for solving vehicle routing problems. J Comput Sci 25:28–37
Hado H (2010) Double q-learning. Adv Neural Inf Process Syst 23:2613–2621
Klapp MA, Erera AL, Toriello A (2018) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415
Li C, Li Y, Zhao Y, Peng P, Sler XG (2021) Self-generated long-term experience replay for continual reinforcement learning. Appl Intell 51(1):185–201
Li H, Li Z, Li C, Wang R, Mu R (2020) Research on optimization of electric vehicle routing problem with time window. IEEE Access 8:146707–146718
Li M, Qin Z, Jiao Y, Yang Y, Wang J, Wang C, Guobin W, Ye J (2019) Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: The world wide web conference, pp 983–994
Liu S, He L, Max Shen Z-J (2020) On-time last-mile delivery: Order assignment with travel-time predictors. Manag Sci
Lopez PA, Behrisch M, Bieker-Walz L, Erdmann J, Flötteröd Y-P, Hilbrich R, Lücken L., Rummel J, Wagner P, Wießner E. (2018) Microscopic traffic simulation using sumo. In: The 21st IEEE international conference on intelligent transportation systems. IEEE
Mao C, Liu Y, Shen Z-JM (2020) Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach. Trans Res Part C Emerg Technol 115:102626
Marinakis Y, Marinaki M, Migdalas A (2019) A multi-adaptive particle swarm optimization for the vehicle routing problem with time windows. Inform Sci 481:311–329
Mehrjerdi YZ, Shafiee M (2021) A resilient and sustainable closed-loop supply chain using multiple sourcing and information sharing strategies. J Clean Prod 289:125141
meituan.com (2019) The research report for the industry of food delivery service in China 2019. https://mri.meituan.com/institute
Pan J, Wang X, Cheng Y, Qiang Y (2018) Multisource transfer double dqn based on actor learning. IEEE Trans Neural Netw Learn Syst 29(6):2227–2238
Plinere D, Aleksejeva L (2019) Production scheduling in agent-based supply chain for manufacturing efficiency improvement. Procedia Comput Sci 149:36–43
Qiu M, Zhuo F, Eglese R, Tang Q (2018) A tabu search algorithm for the vehicle routing problem with discrete split deliveries and pickups. Comput Oper Res 100:102–116
Ruiz E, Soto-Mendoza V, Barbosa AER, Reyes R (2019) Solving the open vehicle routing problem with capacity and distance constraints with a biased random key genetic algorithm. Comput Ind Eng 133:207–219
Saeedi S (2018) Integrating macro and micro scale approaches in the agent-based modeling of residential dynamics. Int J Appl Earth Obs Geoinf 68:214–229
Ricardo S., Marques A, Amorim P, Rasinmäki J. (2019) Multiple vehicle synchronisation in a full truck-load pickup and delivery problem: A case-study in the biomass supply chain. Eur J Oper Res 277 (1):174–194
Swaminathan JM, Smith SF, Sadeh NM (2007) Modeling supply chain dynamics: A multiagent approach. Decis Sci 29(3):607–632
Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292
Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591
Acknowledgments
This research is supported by National Natural Science Foundation of China (Grant No. 71771035, 71831003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, G., Tang, J., Yilmaz, L. et al. Online food ordering delivery strategies based on deep reinforcement learning. Appl Intell 52, 6853–6865 (2022). https://doi.org/10.1007/s10489-021-02750-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02750-3