Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks

Wang, Chenguang; Han, Congying; Guo, Tiande; Ding, Man

doi:10.1007/s10489-022-03453-z

Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks

Published: 04 May 2022

Volume 53, pages 2010–2025, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chenguang Wang¹,
Congying Han^1,2,
Tiande Guo^1,2 &
…
Man Ding¹

824 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The P-Median Problem is one of the basic cases of facility location problems and has been studied for many years. Most methods of solving it are based on classical heuristics or meta-heuristic and they don’t perform well on large-scale problems according to time cost. In this paper, we propose the first reinforcement learning-based method which uses Multi-Talking-Heads Graph Attention Networks to learn representations and design a learnable attention mechanism to solve the uncapacitated P-Median Problem. We train the model using REINFORCE algorithm and show that it has good performance on uncapacitated P-Median Problem according to solution quality and time consumption. We also apply our model to the realistic dataset and empirically figure out that the difference between data distributions is one of the most important factors to influence the final performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving Combinatorial Problems with Machine Learning Methods

Reinforcement Learning for the Knapsack Problem

Analysis of box and ellipsoidal robust optimization, and attention model based reinforcement learning for a robust vehicle routing problem

Article 05 April 2022

Notes

The performance of VNS is too bad to demonstrate
Table 2 Our model vs baselines on generated data. The gap % is w.r.t. the best value across all methods. Std means the standard deviation(%) of the gap. Results in boldface type mean the best
Full size table

References

Guo T, Han C, Tang S, Ding M (2019) Solving combinatorial problems with machine learning methods. In: Nonlinear Combinatorial Optimization. Springer, pp 207–229
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256
Article MATH Google Scholar
Gurobi Optimization LLC (2021) Gurobi Optimizer Reference Manual. https://www.gurobi.com
Cebecauer M, Buzna L (2018) Large-scale test data set for location problems. Data in brief 17:267–274
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533
Article Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. Machine learning 8(3-4):279–292
Article MATH Google Scholar
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, PMLR, pp 387–395
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp 1889–1897
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2016) Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1):1334–1373
MathSciNet MATH Google Scholar
Deng Y, Bao F, Kong Y, Ren Z, Dai Q (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems 28(3):653–664
Article Google Scholar
Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp 167–176
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484–489
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Article MathSciNet MATH Google Scholar
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609
Article Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is q-learning provably efficient?. arXiv:1807.03765
Jin C, Liu Q, Miryoosefi S (2021) Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. arXiv:2102.00815
Duan Y, Jin C, Li Z (2021) Risk bounds and rademacher complexity in batch reinforcement learning. arXiv:2103.13883
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Proc Mag 34(6):26–38
Article Google Scholar
Mousavi SS, Schukat M, Howley E (2016) Deep reinforcement learning: an overview. In: Proceedings of SAI Intelligent Systems Conference, Springer, pp 426–440
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics 50(9):3826–3839
Article Google Scholar
Asim M, Wang Y, Wang K, Huang P-Q (2020) A review on computational intelligence techniques in cloud and edge computing. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (6):742–763
Article Google Scholar
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700
Lu H, Zhang X, Yang S (2019) A learning-based iterative method for solving vehicle routing problems. In: International Conference on Learning Representations
Manchanda S, Mittal A, Dhawan A, Medya S, Ranu S, Singh A (2019) Learning heuristics over large graphs via deep reinforcement learning. arXiv:1903.03332
Mazyavkina N, Sviridov S, Ivanov S, Burnaev E (2021) Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, p 105400
Cappart Q, Chételat D, Khalil E, Lodi A, Morris C, Veličković P (2021) Combinatorial optimization and reasoning with graph neural networks. arXiv:2102.09544
Nowak A, Villar S, Bandeira AS, Bruna J (2017) A note on learning algorithms for quadratic assignment with graph neural networks. stat 1050:22
Google Scholar
Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems!. 7th International Conference on Learning Representations, ICLR 2019, pp 1–25. 1803.08475
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Wu Y, Song W, Cao Z, Zhang J, Lim A (2021) Learning improvement heuristics for solving routing problems. IEEE Transactions on Neural Networks and Learning Systems
Fu Z-H, Qiu K-B, Zha H (2020) Generalize a small pre-trained model to arbitrarily large tsp instances. arXiv:2012.10658
Kool W, van Hoof H, Gromicho J, Welling M (2021) Deep policy dynamic programming for vehicle routing problems. arXiv:2102.11756
Lodi A, Mossina L, Rachelson E (2020) Learning to handle parameter perturbations in combinatorial optimization: an application to facility location. EURO Journal on Transportation and Logistics 9 (4):100023
Article Google Scholar
Gamrath G, Anderson D, Bestuzheva K, Chen W-K, Eifler L, Gasse M, Gemander P, Gleixner A, Gottwald L, Halbig K et al (2020) The scip optimization suite 7.0
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421
Article MathSciNet MATH Google Scholar
Vesselinova N, Steinert R, Perez-Ramirez DF, Boman M (2020) Learning combinatorial optimization on graphs: A survey with applications to networking. IEEE Access 8:120388–120416
Article Google Scholar
Peng Y, Choi B, Xu J (2021) Graph learning for combinatorial optimization: A survey of state-of-the-art. Data Science and Engineering 6(2):119–141
Article Google Scholar
Shazeer N, Lan Z, Cheng Y, Ding N, Hou L (2020) Talking-heads attention. arXiv:2003.02436
Joshi CK, Cappart Q, Rousseau L-M, Laurent T, Bresson X (2020) Learning tsp requires rethinking generalization. arXiv:2006.07054
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Laurens, Maaten VD, Geoffrey H (2008) Visualizing data using t-sne. J Mach Learn Res 9 (2605):2579–2605
MATH Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

Download references

Acknowledgements

This paper is supported by National key research and development program of China (2021YFA1000403) and supported by the National Natural Science Foundation of China (Nos. 11991022, U19B2040) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Mathematical Sciences, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, People’s Republic of China
Chenguang Wang, Congying Han, Tiande Guo & Man Ding
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No.80 Zhonguancun East Road, Beijing, 100190, People’s Republic of China
Congying Han & Tiande Guo

Authors

Chenguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Congying Han
View author publications
You can also search for this author in PubMed Google Scholar
Tiande Guo
View author publications
You can also search for this author in PubMed Google Scholar
Man Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Congying Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix : A: Implementation Details of Transformer Module

Attention mechanism is embedded in the Transformer [3]. In this section we present some technical details about the implementation. For a node within a graph, the attention means the weights of messages should receive from its neighbor nodes. This node’s weight of messages depends on the compatibility between its query and the key of its neighbors. Formally, we denote the node representation as $h_{i}\in \mathbb {R}^{d_{h}}$ and matrices $W^{K},W^{Q}\in \mathbb {R}^{d_{k}\times d_{h}},W^{V}\in \mathbb {R}^{d_{v}\times d_{h}}$ and compute the key $k_{i}\in \mathbb {R}^{d_{k}}$, value $v_{i}\in \mathbb {R}^{d_{v}}$, query $q_{i}\in \mathbb {R}^{d_{k}}$ for each node in the following way:

$$ q_{i} = W^{Q} h_{i},\ k_{i} = W^{K} h_{i},\ v_{i} = W^{V} h_{i}, $$

(21)

From these queries and keys, we can obtain the compatibility of node i’s query and node j’s key: $u_{ij}\in \mathbb {R}$ as the scaled dot-product [3]:

$$ u_{ij} = \begin{cases} \frac{{q_{i}}^{T} {k}_{j}}{\sqrt{d_{\text{k}}}} & \text{if } \text{node } i \text{ is adjacent to } j\\ -\infty & \text{otherwise.} \end{cases} $$

(22)

We then compute the attention weights a_ij ∈ [0, 1] using softmax function:

$$ a_{ij}=\frac{e^{u_{ij}}}{{\sum}_{j^{\prime}}e^{u_{ij^{\prime}}}} $$

(23)

At last, we complete the information aggregation to get a new node representation by:

$$ h_{i}^{\prime}={\sum}_{j} a_{ij}v_{j} $$

(24)

Furthermore, as noted in [3], it’s beneficial to introduce Multi-head Attention which has multiple attention heads which can help to capture different kinds of information from different neighbors. In particular, we parameterize a set of matrices $\{{W^{K}_{m}},{W^{Q}_{m}}\in \mathbb {R}^{d_{k}\times d_{h}},{W^{V}_{m}}\in \mathbb {R}^{d_{v}\times d_{h}},m=1,2,...M\}$ and obtain M node representations for node i by above attention mechanism $\{ h_{i,m}^{\prime },m=1,2,...M\}$. The final multi-head attention value for node i is obtained by:

$$ MHA_{i}(h_{1},...,h_{n})=\sum\limits_{m=1}^{M}{W_{m}^{O}}h_{i,m}^{\prime} $$

(25)

where $\{{W_{m}^{O}}\in \mathbb {R}^{d_{h}\times d_{v}},m=1,2,...M\}$

At the implementation level, we need some extra neural networks modules to obtain these representations: Feed-forward sublayer and batch normalization [48].

The feed-forward sublayer computes each node’s representations using a linear projection followed a ReLU activation:

$$ FF(h_{i})=W^{ff,1}\cdot ReLU(W^{ff,0}h_{i}+b^{ff,0})+b^{ff,1} $$

(26)

The batch normalization aims to reduce internal covariate shift during training neural networks and the formula with a batch of inputs {x^{(k),k= 1,2,...}} is as follows:

$$ \hat{x}^{(k)}=\frac{x^{(k)}-E[x^{(k)}]}{\sqrt{Var[x^{(k)}]}} $$

The overall pipeline of this process is shown in Fig. 10

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Han, C., Guo, T. et al. Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks. Appl Intell 53, 2010–2025 (2023). https://doi.org/10.1007/s10489-022-03453-z

Download citation

Accepted: 26 February 2022
Published: 04 May 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03453-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks

Abstract

Access this article

Similar content being viewed by others

Solving Combinatorial Problems with Machine Learning Methods

Reinforcement Learning for the Knapsack Problem

Analysis of box and ellipsoidal robust optimization, and attention model based reinforcement learning for a robust vehicle routing problem

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix : A: Implementation Details of Transformer Module

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks

Abstract

Access this article

Similar content being viewed by others

Solving Combinatorial Problems with Machine Learning Methods

Reinforcement Learning for the Knapsack Problem

Analysis of box and ellipsoidal robust optimization, and attention model based reinforcement learning for a robust vehicle routing problem

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix : A: Implementation Details of Transformer Module

Appendix : A: Implementation Details of Transformer Module

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation