Abstract
At each iteration of the simplex method there are typically many possible entering columns. We use deep value-based reinforcement learning to choose dynamically between two popular pivoting rules. We consider LP relaxations of the MTZ formulation of non-Euclidean TSPs with five cities. We obtain a 20–50% speed up on these very small instances. Although our methods are not remotely competitive or viable on large instances, our results indicate that there may be scope to substantially accelerate current LP solvers by augmenting them with a learned pivoting strategy.



References
Applegate, D.L., Bixby, R.E., Chvátal, V., Cook, W.J.: Implementing the Dantzig-Fulkerson-Johnson algorithm for large traveling salesman problems. Math. Program. 97(1), 91–153 (2003)
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. https://openreview.net (2017)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Bertsimas, D., Stellato, B.: Online mixed-integer optimization in milliseconds. arXiv preprint arXiv:1907.02206 (2019)
Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: van Hoeve, W.-J. (ed.) Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 595–604. Springer International Publishing, Cham (2018)
Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1965)
Goldfarb, D., Forrest, J.J.: Steepest-edge simplex algorithms for linear programming. Math. Program. 57, 341–374 (1992)
Hansknecht, C., Joormann, I., Stiller, S.: Cuts, primal heuristics, and learning to branch for the time-dependent traveling salesman problem. arXiv preprint arXiv:1805.01415 (2018)
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 6348–6358. Curran Associates Inc., New York (2017)
Khalil, E.B., Bodic, P.L., Song, L., Nemhauser, G., Dilkina, B.: Learning to branch in mixed integer programming. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 724–731. AAAI Press (2016)
Klee, V., Minty, G.J.: How good is the simplex algorithm In: Shisha, O. (ed.) Inequalities: III. Acad Press, New York (1972)
Kuhn, H.W., Quandt, R.E.: An experimental study of the simplex method. In: Proceedings of Symposia in Applied Maths, vol. XV, pp. 107–124 (1963)
Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulations and traveling salesman problems. J. Assoc. Comput. Mach. 7(4), 326–329 (1960)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Ploskas, N., Samaras, N.: Pivoting rules for the revised simplex algorithm. Yugosl. J. Oper. Res. 24, 321–332 (2014)
Thomadakis, M.E.: Implementation and evaluation of primal and dual simplex methods with different pivot-selection techniques in the LPBench environment, a research report. Texas A &M University, Department of Computer Science (1994)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates Inc, New York (2015)
Wolfe, P., Cutler, L.: Experiments in linear programming. In: Graves, R.L., Wolfe, P. (eds.) Recent Advances in Mathematical Programming. McGraw-Hill, New York (1963)
Wolpert, D.H., Macready, W.G., et al.: No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1(1), 67–82 (1997)
Acknowledgements
Tavaslıoğlu and Schaefer were partially supported by National Science Foundation grant CMMI-1933373.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Suriyanarayana, V., Tavaslıoğlu, O., Patel, A.B. et al. Reinforcement learning of simplex pivot rules: a proof of concept. Optim Lett 16, 2513–2525 (2022). https://doi.org/10.1007/s11590-022-01880-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-022-01880-y