Abstract
Low-carbon logistics is an emerging and sustainable development industry in the era of a low-carbon economy. The end-to-end deep reinforcement learning (DRL) method with an encoder-decoder framework has been proven effective for solving logistics problems. However, in most cases, the recurrent neural networks (RNN) and attention mechanisms are used in encoders and decoders, which may result in the long-distance dependence problem and the neglect of the correlation between query vectors. To surround this problem, we propose an improved transformer model (TAOA) with both multi-head attention mechanism (MHA) and attention to attention mechanism (AOA), and apply it to solve the low-carbon multi-depot vehicle routing problem (MDVRP). In this model, the MHA and AOA are implemented to solve the probability of route nodes in the encoder and decoder. The MHA is used to process different parts of the input sequence, which can be calculated in parallel, and the AOA is used to deal with the deficiency problem of correlation between query results and query vectors in the MHA. The actor-critic framework based on strategy gradient is constructed to train model parameters. The 2opt operator is further used to optimize the resulting routes. Finally, extensive numerical studies are carried out to verify the effectiveness and operation efficiency of the proposed TAOA, and the results show that the proposed TAOA performs better in solving the MDVRP than the traditional transformer model (Kools), genetic algorithm (GA), and Google OR-Tools (Ortools).






Similar content being viewed by others
References
Bello, I., Pham, H., Le, Q.V. et al. (2019). Neural combinatorial optimization with reinforcement learning. In 5th International conference on learning representations, ICLR 2017—Workshop track proceedings.
Bock, S., & Wei, M. G. (2019, July). A proof of local convergence for the Adam optimizer. In 2019 International joint conference on neural networks (IJCNN) (pp. 1–8).
Brandão de Oliveira, H. C., & Vasconcelos, G. C. (2010). A hybrid search method for the vehicle routing problem with time windows. Annals of Operations Research, 180, 125–144.
Bresson, X., & Laurent, T. (2021). The transformer network for the traveling salesman problem. arXiv preprint arXiv:2103.03012.
Camacho-Vallejo, J. F., López-Vera, L., et al. (2021). A tabu search algorithm to solve a green logistics bi-objective bi-level problem. Annals of Operations Research, 12(4), 1–27.
Deudon, M., Cournut, P., Lacoste, A. et al. (2018). Learning heuristics for the tsp by policy gradient. In International conference on the integration of constraint programming, artificial intelligence, and operations research (pp. 170–181).
Eggleston, H. S., Buendia, L., Miwa, K. et al. (2006). 2006 IPCC guidelines for national greenhouse gas inventories.
Facts, E. (2005). Average carbon dioxide emissions resulting from gasoline and diesel fuel, United States Environmental Protection Agency, Seattle, Wash., USA
Galindres-Guancha, L. F., Toro-Ocampo, E. M., & Rendón, R. A. (2018). Multi-objective MDVRP solution considering route balance and cost using the ILS metaheuristic. International Journal of Industrial Engineering Computations, 9(1), 33–46.
Gillett, B. E., & Johnson, J. G. (1976). Multi-terminal vehicle-dispatch algorithm. Omega, 4(6), 711–718.
Haarnoja, T., Zhou, A. et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
Huang, L. et al. (2019). Attention on attention for image captioning. In Proceedings of the IEEE/CVF international conference on computer vision.
Huang, Z., Liang, D., Xu, P. et al. (2020). Improve transformer models with better relative position embeddings. arXiv preprint arXiv:2009.13658.
Kalaivaani, P., Sathishkumar, V. E., Hatamleh, W. A., et al. (2021). Advanced lightweight feature interaction in deep neural networks for improving the prediction in click through rate. Annals of Operations Research, 11, 1–15.
Kool, W., Hoof, H. V., & Welling, M. (2019). Attention, learn to solve routing problems! In: 7th International conference on learning representations.
Kruk, S. (2018). Practical python AI projects: Mathematical models of optimization problems with Google OR-tools, Apress.
Kuo, Y., & Wang, C. (2011). A variable neighborhood search for the multi-depot vehicle routing problem with loading cost. Expert Systems with Applications, 39(8), 6949–6954.
Kurbiel, T., & Khaleghian, S. (2017). Training of deep neural networks based on distance measures using RMSProp. arXiv preprint arXiv:1708.01911.
Li, J., Wang, R., Li, T., et al. (2018). Benefit analysis of shared depot resources for multi-depot vehicle routing problem with fuel consumption. Transportation Research Part d: Transport and Environment, 59, 417–432.
Ma, Q., Ge, S., He, D. et al. (2019). Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936.
Marrekchi, E., Besbes, W., Dhouib, D., et al. (2021). A review of recent advances in the operations research literature on the green routing problem and its variants. Annals of Operations Research, 304, 529–574.
Mizutani, E., & Dreyfus, S. (2017). Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains. Annals of Operations Research, 258, 107–131.
Nazari, M., Oroojlooy, A., Takáč, M. et al. (2018). Reinforcement learning for solving the vehicle routing problem. In: Advances in neural information processing systems.
Penna, P. H. V., Subramanian, A., Ochi, L. S., et al. (2019). A hybrid heuristic for a broad class of vehicle routing problems with heterogeneous fleet. Annals of Operations Research, 273, 5–74.
Potvin, J. Y. (1996). Genetic algorithms for the traveling salesman problem. Annals of Operations Research, 63, 337–370.
Powell, W. B. (2016). Perspectives of approximate dynamic programming. Annals of Operations Research, 241, 319–356.
Roy, J., Pamučar, D., & Kar, S. (2020). Evaluation and selection of third party logistics provider under sustainability perspectives: An interval valued fuzzy-rough approach. Annals of Operations Research, 293, 669–714.
Sahin, B., Yilmaz, H., Ust, Y., et al. (2009). An approach for analysing transportation costs and a case study. European Journal of Operational Research, 193(1), 1–11.
Salhi, S., Imran, A., & Wassan, N. A. (2014). The multi-depot vehicle routing problem with heterogeneous vehicle fleet: Formulation and a variable neighborhood search implementation. Computers & Operations Research, 52, 315–325.
Sayli, M., & Yılmaz, E. (2017). Anti-periodic solutions for state-dependent impulsive recurrent neural networks with time-varying and continuously distributed delays. Annals of Operations Research, 258, 159–185.
Sbihi, A., & Eglese, R. W. (2010). Combinatorial optimization and green logistics. Annals of Operations Research, 175(1), 159–175.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In: Advances in neural information processing systems (pp. 5998–6008).
Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. arXiv preprint arXiv:1506.03134.
Ward, R., Wu, X., & Bottou, L. (2018). Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization. arXiv preprint arXiv:1806.01811
Xiao, Y., Zhao, Q., et al. (2012). Development of a fuel consumption optimization model for the capacitated vehicle routing problem. Computers & Operations Research, 39(7), 1419–1431.
Yang, H. (2021). Extended attention mechanism for TSP problem. In 2021 International joint conference on neural networks (IJCNN).
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under grant number 71971041; in part by the Outstanding Young Scientific and Technological Talents Foundation of Sichuan Province under grant number 2020JDJQ0035; and in part by the Major Program of National Social Science Foundation of China under Grant 20&ZD084.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, Y., Wu, H., Yin, Y. et al. An improved transformer model with multi-head attention and attention to attention for low-carbon multi-depot vehicle routing problem. Ann Oper Res 339, 517–536 (2024). https://doi.org/10.1007/s10479-022-04788-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-022-04788-z