Skip to main content
Log in

Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The P-Median Problem is one of the basic cases of facility location problems and has been studied for many years. Most methods of solving it are based on classical heuristics or meta-heuristic and they don’t perform well on large-scale problems according to time cost. In this paper, we propose the first reinforcement learning-based method which uses Multi-Talking-Heads Graph Attention Networks to learn representations and design a learnable attention mechanism to solve the uncapacitated P-Median Problem. We train the model using REINFORCE algorithm and show that it has good performance on uncapacitated P-Median Problem according to solution quality and time consumption. We also apply our model to the realistic dataset and empirically figure out that the difference between data distributions is one of the most important factors to influence the final performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The performance of VNS is too bad to demonstrate

    Table 2 Our model vs baselines on generated data. The gap % is w.r.t. the best value across all methods. Std means the standard deviation(%) of the gap. Results in boldface type mean the best

References

  1. Guo T, Han C, Tang S, Ding M (2019) Solving combinatorial problems with machine learning methods. In: Nonlinear Combinatorial Optimization. Springer, pp 207–229

  2. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press

  3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  4. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256

    Article  MATH  Google Scholar 

  5. Gurobi Optimization LLC (2021) Gurobi Optimizer Reference Manual. https://www.gurobi.com

  6. Cebecauer M, Buzna L (2018) Large-scale test data set for location problems. Data in brief 17:267–274

    Article  Google Scholar 

  7. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533

    Article  Google Scholar 

  8. Watkins CJCH, Dayan P (1992) Q-learning. Machine learning 8(3-4):279–292

    Article  MATH  Google Scholar 

  9. Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014

  10. Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, PMLR, pp 387–395

  11. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp 1889–1897

  12. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971

  13. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  14. Babaeizadeh M, Frosio I, Tyree S, Clemons J, Kautz J (2016) Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv:1611.06256

  15. Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  16. Deng Y, Bao F, Kong Y, Ren Z, Dai Q (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems 28(3):653–664

    Article  Google Scholar 

  17. Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp 167–176

  18. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484–489

    Article  Google Scholar 

  19. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  MATH  Google Scholar 

  20. Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609

    Article  Google Scholar 

  21. Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is q-learning provably efficient?. arXiv:1807.03765

  22. Jin C, Liu Q, Miryoosefi S (2021) Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. arXiv:2102.00815

  23. Duan Y, Jin C, Li Z (2021) Risk bounds and rademacher complexity in batch reinforcement learning. arXiv:2103.13883

  24. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Proc Mag 34(6):26–38

    Article  Google Scholar 

  25. Mousavi SS, Schukat M, Howley E (2016) Deep reinforcement learning: an overview. In: Proceedings of SAI Intelligent Systems Conference, Springer, pp 426–440

  26. Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics 50(9):3826–3839

    Article  Google Scholar 

  27. Asim M, Wang Y, Wang K, Huang P-Q (2020) A review on computational intelligence techniques in cloud and edge computing. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (6):742–763

    Article  Google Scholar 

  28. Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700

  29. Lu H, Zhang X, Yang S (2019) A learning-based iterative method for solving vehicle routing problems. In: International Conference on Learning Representations

  30. Manchanda S, Mittal A, Dhawan A, Medya S, Ranu S, Singh A (2019) Learning heuristics over large graphs via deep reinforcement learning. arXiv:1903.03332

  31. Mazyavkina N, Sviridov S, Ivanov S, Burnaev E (2021) Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, p 105400

  32. Cappart Q, Chételat D, Khalil E, Lodi A, Morris C, Veličković P (2021) Combinatorial optimization and reasoning with graph neural networks. arXiv:2102.09544

  33. Nowak A, Villar S, Bandeira AS, Bruna J (2017) A note on learning algorithms for quadratic assignment with graph neural networks. stat 1050:22

    Google Scholar 

  34. Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems!. 7th International Conference on Learning Representations, ICLR 2019, pp 1–25. 1803.08475

  35. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903

  36. Wu Y, Song W, Cao Z, Zhang J, Lim A (2021) Learning improvement heuristics for solving routing problems. IEEE Transactions on Neural Networks and Learning Systems

  37. Fu Z-H, Qiu K-B, Zha H (2020) Generalize a small pre-trained model to arbitrarily large tsp instances. arXiv:2012.10658

  38. Kool W, van Hoof H, Gromicho J, Welling M (2021) Deep policy dynamic programming for vehicle routing problems. arXiv:2102.11756

  39. Lodi A, Mossina L, Rachelson E (2020) Learning to handle parameter perturbations in combinatorial optimization: an application to facility location. EURO Journal on Transportation and Logistics 9 (4):100023

    Article  Google Scholar 

  40. Gamrath G, Anderson D, Bestuzheva K, Chen W-K, Eifler L, Gasse M, Gemander P, Gleixner A, Gottwald L, Halbig K et al (2020) The scip optimization suite 7.0

  41. Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290(2):405–421

    Article  MathSciNet  MATH  Google Scholar 

  42. Vesselinova N, Steinert R, Perez-Ramirez DF, Boman M (2020) Learning combinatorial optimization on graphs: A survey with applications to networking. IEEE Access 8:120388–120416

    Article  Google Scholar 

  43. Peng Y, Choi B, Xu J (2021) Graph learning for combinatorial optimization: A survey of state-of-the-art. Data Science and Engineering 6(2):119–141

    Article  Google Scholar 

  44. Shazeer N, Lan Z, Cheng Y, Ding N, Hou L (2020) Talking-heads attention. arXiv:2003.02436

  45. Joshi CK, Cappart Q, Rousseau L-M, Laurent T, Bresson X (2020) Learning tsp requires rethinking generalization. arXiv:2006.07054

  46. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  47. Laurens, Maaten VD, Geoffrey H (2008) Visualizing data using t-sne. J Mach Learn Res 9 (2605):2579–2605

    MATH  Google Scholar 

  48. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

Download references

Acknowledgements

This paper is supported by National key research and development program of China (2021YFA1000403) and supported by the National Natural Science Foundation of China (Nos. 11991022, U19B2040) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Congying Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix : A: Implementation Details of Transformer Module

Appendix : A: Implementation Details of Transformer Module

Attention mechanism is embedded in the Transformer [3]. In this section we present some technical details about the implementation. For a node within a graph, the attention means the weights of messages should receive from its neighbor nodes. This node’s weight of messages depends on the compatibility between its query and the key of its neighbors. Formally, we denote the node representation as \(h_{i}\in \mathbb {R}^{d_{h}}\) and matrices \(W^{K},W^{Q}\in \mathbb {R}^{d_{k}\times d_{h}},W^{V}\in \mathbb {R}^{d_{v}\times d_{h}}\) and compute the key \(k_{i}\in \mathbb {R}^{d_{k}}\), value \(v_{i}\in \mathbb {R}^{d_{v}}\), query \(q_{i}\in \mathbb {R}^{d_{k}}\) for each node in the following way:

$$ q_{i} = W^{Q} h_{i},\ k_{i} = W^{K} h_{i},\ v_{i} = W^{V} h_{i}, $$
(21)

From these queries and keys, we can obtain the compatibility of node i’s query and node j’s key: \(u_{ij}\in \mathbb {R}\) as the scaled dot-product [3]:

$$ u_{ij} = \begin{cases} \frac{{q_{i}}^{T} {k}_{j}}{\sqrt{d_{\text{k}}}} & \text{if } \text{node } i \text{ is adjacent to } j\\ -\infty & \text{otherwise.} \end{cases} $$
(22)

We then compute the attention weights aij ∈ [0, 1] using softmax function:

$$ a_{ij}=\frac{e^{u_{ij}}}{{\sum}_{j^{\prime}}e^{u_{ij^{\prime}}}} $$
(23)

At last, we complete the information aggregation to get a new node representation by:

$$ h_{i}^{\prime}={\sum}_{j} a_{ij}v_{j} $$
(24)

Furthermore, as noted in [3], it’s beneficial to introduce Multi-head Attention which has multiple attention heads which can help to capture different kinds of information from different neighbors. In particular, we parameterize a set of matrices \(\{{W^{K}_{m}},{W^{Q}_{m}}\in \mathbb {R}^{d_{k}\times d_{h}},{W^{V}_{m}}\in \mathbb {R}^{d_{v}\times d_{h}},m=1,2,...M\}\) and obtain M node representations for node i by above attention mechanism \(\{ h_{i,m}^{\prime },m=1,2,...M\}\). The final multi-head attention value for node i is obtained by:

$$ MHA_{i}(h_{1},...,h_{n})=\sum\limits_{m=1}^{M}{W_{m}^{O}}h_{i,m}^{\prime} $$
(25)

where \(\{{W_{m}^{O}}\in \mathbb {R}^{d_{h}\times d_{v}},m=1,2,...M\}\)

At the implementation level, we need some extra neural networks modules to obtain these representations: Feed-forward sublayer and batch normalization [48].

The feed-forward sublayer computes each node’s representations using a linear projection followed a ReLU activation:

$$ FF(h_{i})=W^{ff,1}\cdot ReLU(W^{ff,0}h_{i}+b^{ff,0})+b^{ff,1} $$
(26)

The batch normalization aims to reduce internal covariate shift during training neural networks and the formula with a batch of inputs {x(k),k= 1,2,...} is as follows:

$$ \hat{x}^{(k)}=\frac{x^{(k)}-E[x^{(k)}]}{\sqrt{Var[x^{(k)}]}} $$

The overall pipeline of this process is shown in Fig. 10

Fig. 10
figure 10

The pipeline of Transformer module

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Han, C., Guo, T. et al. Solving uncapacitated P-Median problem with reinforcement learning assisted by graph attention networks. Appl Intell 53, 2010–2025 (2023). https://doi.org/10.1007/s10489-022-03453-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03453-z

Keywords

Navigation