Reinforcement learning of simplex pivot rules: a proof of concept

Suriyanarayana, Varun; Tavaslıoğlu, Onur; Patel, Ankit B.; Schaefer, Andrew J.

doi:10.1007/s11590-022-01880-y

Reinforcement learning of simplex pivot rules: a proof of concept

Short Communication
Published: 22 April 2022

Volume 16, pages 2513–2525, (2022)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Varun Suriyanarayana¹,
Onur Tavaslıoğlu²,
Ankit B. Patel^3,4 &
…
Andrew J. Schaefer ORCID: orcid.org/0000-0002-0379-741X⁵

658 Accesses
Explore all metrics

Abstract

At each iteration of the simplex method there are typically many possible entering columns. We use deep value-based reinforcement learning to choose dynamically between two popular pivoting rules. We consider LP relaxations of the MTZ formulation of non-Euclidean TSPs with five cities. We obtain a 20–50% speed up on these very small instances. Although our methods are not remotely competitive or viable on large instances, our results indicate that there may be scope to substantially accelerate current LP solvers by augmenting them with a learned pivoting strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Applegate, D.L., Bixby, R.E., Chvátal, V., Cook, W.J.: Implementing the Dantzig-Fulkerson-Johnson algorithm for large traveling salesman problems. Math. Program. 97(1), 91–153 (2003)
Article MathSciNet Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. https://openreview.net (2017)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Article MathSciNet Google Scholar
Bertsimas, D., Stellato, B.: Online mixed-integer optimization in milliseconds. arXiv preprint arXiv:1907.02206 (2019)
Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: van Hoeve, W.-J. (ed.) Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 595–604. Springer International Publishing, Cham (2018)
Chapter Google Scholar
Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1965)
Google Scholar
Goldfarb, D., Forrest, J.J.: Steepest-edge simplex algorithms for linear programming. Math. Program. 57, 341–374 (1992)
Article MathSciNet Google Scholar
Hansknecht, C., Joormann, I., Stiller, S.: Cuts, primal heuristics, and learning to branch for the time-dependent traveling salesman problem. arXiv preprint arXiv:1805.01415 (2018)
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 6348–6358. Curran Associates Inc., New York (2017)
Google Scholar
Khalil, E.B., Bodic, P.L., Song, L., Nemhauser, G., Dilkina, B.: Learning to branch in mixed integer programming. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 724–731. AAAI Press (2016)
Klee, V., Minty, G.J.: How good is the simplex algorithm In: Shisha, O. (ed.) Inequalities: III. Acad Press, New York (1972)
Kuhn, H.W., Quandt, R.E.: An experimental study of the simplex method. In: Proceedings of Symposia in Applied Maths, vol. XV, pp. 107–124 (1963)
Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulations and traveling salesman problems. J. Assoc. Comput. Mach. 7(4), 326–329 (1960)
Article MathSciNet Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Ploskas, N., Samaras, N.: Pivoting rules for the revised simplex algorithm. Yugosl. J. Oper. Res. 24, 321–332 (2014)
Article MathSciNet Google Scholar
Thomadakis, M.E.: Implementation and evaluation of primal and dual simplex methods with different pivot-selection techniques in the LPBench environment, a research report. Texas A &M University, Department of Computer Science (1994)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates Inc, New York (2015)
Google Scholar
Wolfe, P., Cutler, L.: Experiments in linear programming. In: Graves, R.L., Wolfe, P. (eds.) Recent Advances in Mathematical Programming. McGraw-Hill, New York (1963)
MATH Google Scholar
Wolpert, D.H., Macready, W.G., et al.: No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1(1), 67–82 (1997)
Article Google Scholar

Download references

Acknowledgements

Tavaslıoğlu and Schaefer were partially supported by National Science Foundation grant CMMI-1933373.

Author information

Authors and Affiliations

School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, 14850, USA
Varun Suriyanarayana
Amazon, Bellevue, WA 98004, USA
Onur Tavaslıoğlu
Department of Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
Ankit B. Patel
Department of Electrical and Computer Engineering, Rice University, Houston, TX, 77005, USA
Ankit B. Patel
Department of Computational and Applied Mathematics, Rice University, Houston, TX, 77005, USA
Andrew J. Schaefer

Authors

Varun Suriyanarayana
View author publications
You can also search for this author inPubMed Google Scholar
Onur Tavaslıoğlu
View author publications
You can also search for this author inPubMed Google Scholar
Ankit B. Patel
View author publications
You can also search for this author inPubMed Google Scholar
Andrew J. Schaefer
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Andrew J. Schaefer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suriyanarayana, V., Tavaslıoğlu, O., Patel, A.B. et al. Reinforcement learning of simplex pivot rules: a proof of concept. Optim Lett 16, 2513–2525 (2022). https://doi.org/10.1007/s11590-022-01880-y

Download citation

Received: 16 August 2021
Accepted: 17 March 2022
Published: 22 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11590-022-01880-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning of simplex pivot rules: a proof of concept

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now