Abstract
The P-Median Problem is one of the basic cases of facility location problems and has been studied for many years. Most methods of solving it are based on classical heuristics or meta-heuristic and they don’t perform well on large-scale problems according to time cost. In this paper, we propose the first reinforcement learning-based method which uses Multi-Talking-Heads Graph Attention Networks to learn representations and design a learnable attention mechanism to solve the uncapacitated P-Median Problem. We train the model using REINFORCE algorithm and show that it has good performance on uncapacitated P-Median Problem according to solution quality and time consumption. We also apply our model to the realistic dataset and empirically figure out that the difference between data distributions is one of the most important factors to influence the final performances.
Similar content being viewed by others
Notes
The performance of VNS is too bad to demonstrate
References
Guo T, Han C, Tang S, Ding M (2019) Solving combinatorial problems with machine learning methods. In: Nonlinear Combinatorial Optimization. Springer, pp 207–229
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256
Gurobi Optimization LLC (2021) Gurobi Optimizer Reference Manual. https://www.gurobi.com
Cebecauer M, Buzna L (2018) Large-scale test data set for location problems. Data in brief 17:267–274
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533
Watkins CJCH, Dayan P (1992) Q-learning. Machine learning 8(3-4):279–292
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, PMLR, pp 387–395
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp 1889–1897
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2016) Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1):1334–1373
Deng Y, Bao F, Kong Y, Ren Z, Dai Q (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems 28(3):653–664
Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp 167–176
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484–489
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is q-learning provably efficient?. arXiv:1807.03765
Jin C, Liu Q, Miryoosefi S (2021) Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. arXiv:2102.00815
Duan Y, Jin C, Li Z (2021) Risk bounds and rademacher complexity in batch reinforcement learning. arXiv:2103.13883
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Proc Mag 34(6):26–38
Mousavi SS, Schukat M, Howley E (2016) Deep reinforcement learning: an overview. In: Proceedings of SAI Intelligent Systems Conference, Springer, pp 426–440
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics 50(9):3826–3839
Asim M, Wang Y, Wang K, Huang P-Q (2020) A review on computational intelligence techniques in cloud and edge computing. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (6):742–763
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700
Lu H, Zhang X, Yang S (2019) A learning-based iterative method for solving vehicle routing problems. In: International Conference on Learning Representations
Manchanda S, Mittal A, Dhawan A, Medya S, Ranu S, Singh A (2019) Learning heuristics over large graphs via deep reinforcement learning. arXiv:1903.03332
Mazyavkina N, Sviridov S, Ivanov S, Burnaev E (2021) Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, p 105400
Cappart Q, Chételat D, Khalil E, Lodi A, Morris C, Veličković P (2021) Combinatorial optimization and reasoning with graph neural networks. arXiv:2102.09544
Nowak A, Villar S, Bandeira AS, Bruna J (2017) A note on learning algorithms for quadratic assignment with graph neural networks. stat 1050:22
Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems!. 7th International Conference on Learning Representations, ICLR 2019, pp 1–25. 1803.08475
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Wu Y, Song W, Cao Z, Zhang J, Lim A (2021) Learning improvement heuristics for solving routing problems. IEEE Transactions on Neural Networks and Learning Systems
Fu Z-H, Qiu K-B, Zha H (2020) Generalize a small pre-trained model to arbitrarily large tsp instances. arXiv:2012.10658
Kool W, van Hoof H, Gromicho J, Welling M (2021) Deep policy dynamic programming for vehicle routing problems. arXiv:2102.11756
Lodi A, Mossina L, Rachelson E (2020) Learning to handle parameter perturbations in combinatorial optimization: an application to facility location. EURO Journal on Transportation and Logistics 9 (4):100023
Gamrath G, Anderson D, Bestuzheva K, Chen W-K, Eifler L, Gasse M, Gemander P, Gleixner A, Gottwald L, Halbig K et al (2020) The scip optimization suite 7.0
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421
Vesselinova N, Steinert R, Perez-Ramirez DF, Boman M (2020) Learning combinatorial optimization on graphs: A survey with applications to networking. IEEE Access 8:120388–120416
Peng Y, Choi B, Xu J (2021) Graph learning for combinatorial optimization: A survey of state-of-the-art. Data Science and Engineering 6(2):119–141
Shazeer N, Lan Z, Cheng Y, Ding N, Hou L (2020) Talking-heads attention. arXiv:2003.02436
Joshi CK, Cappart Q, Rousseau L-M, Laurent T, Bresson X (2020) Learning tsp requires rethinking generalization. arXiv:2006.07054
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Laurens, Maaten VD, Geoffrey H (2008) Visualizing data using t-sne. J Mach Learn Res 9 (2605):2579–2605
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Acknowledgements
This paper is supported by National key research and development program of China (2021YFA1000403) and supported by the National Natural Science Foundation of China (Nos. 11991022, U19B2040) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix : A: Implementation Details of Transformer Module
Appendix : A: Implementation Details of Transformer Module
Attention mechanism is embedded in the Transformer [3]. In this section we present some technical details about the implementation. For a node within a graph, the attention means the weights of messages should receive from its neighbor nodes. This node’s weight of messages depends on the compatibility between its query and the key of its neighbors. Formally, we denote the node representation as \(h_{i}\in \mathbb {R}^{d_{h}}\) and matrices \(W^{K},W^{Q}\in \mathbb {R}^{d_{k}\times d_{h}},W^{V}\in \mathbb {R}^{d_{v}\times d_{h}}\) and compute the key \(k_{i}\in \mathbb {R}^{d_{k}}\), value \(v_{i}\in \mathbb {R}^{d_{v}}\), query \(q_{i}\in \mathbb {R}^{d_{k}}\) for each node in the following way:
From these queries and keys, we can obtain the compatibility of node i’s query and node j’s key: \(u_{ij}\in \mathbb {R}\) as the scaled dot-product [3]:
We then compute the attention weights aij ∈ [0, 1] using softmax function:
At last, we complete the information aggregation to get a new node representation by:
Furthermore, as noted in [3], it’s beneficial to introduce Multi-head Attention which has multiple attention heads which can help to capture different kinds of information from different neighbors. In particular, we parameterize a set of matrices \(\{{W^{K}_{m}},{W^{Q}_{m}}\in \mathbb {R}^{d_{k}\times d_{h}},{W^{V}_{m}}\in \mathbb {R}^{d_{v}\times d_{h}},m=1,2,...M\}\) and obtain M node representations for node i by above attention mechanism \(\{ h_{i,m}^{\prime },m=1,2,...M\}\). The final multi-head attention value for node i is obtained by:
where \(\{{W_{m}^{O}}\in \mathbb {R}^{d_{h}\times d_{v}},m=1,2,...M\}\)
At the implementation level, we need some extra neural networks modules to obtain these representations: Feed-forward sublayer and batch normalization [48].
The feed-forward sublayer computes each node’s representations using a linear projection followed a ReLU activation:
The batch normalization aims to reduce internal covariate shift during training neural networks and the formula with a batch of inputs {x(k),k= 1,2,...} is as follows:
The overall pipeline of this process is shown in Fig. 10
Rights and permissions
About this article
Cite this article
Wang, C., Han, C., Guo, T. et al. Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks. Appl Intell 53, 2010–2025 (2023). https://doi.org/10.1007/s10489-022-03453-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03453-z