Skip to main content
Log in

Data-driven algorithm selection and tuning in optimization and signal processing

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Machine learning algorithms typically rely on optimization subroutines and are well known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimization and signal processing algorithms. As a proof of concept, we use our approach to improve two popular data processing subroutines in data science: stochastic gradient descent and greedy methods in compressed sensing. We provide experimental results that demonstrate the answer is “yes”, machine learning algorithms do lead to more effective outcomes for optimization problems, and show the future potential for this research direction. In addition to our experimental work, we prove relevant Probably Approximately Correct (PAC) learning theorems for our problems of interest. More precisely, we show that there exists a learning algorithm that, with high probability, will select the algorithm that optimizes the average performance on an input set of problem instances with a given distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Alvarez, A.M., Louveaux, Q., Wehenkel, L.: A machine learning-based approximation of strong branching. INFORMS J. Comput. 29(1), 185–195 (2017)

    Article  MathSciNet  Google Scholar 

  2. Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., De Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Adv. Neur. In., pp. 3981–3989 (2016)

  3. Balcan, M., Dick, T., Sandholm, T., Vitercik, E.: Learning to branch. In: Int. Conf. Mach. Learn., pp. 353–362 (2018)

  4. Balcan, M., Nagarajan, V., Vitercik, E., White, C.: Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In: Proc. Conf. Learn. Th., pp. 213–274 (2017)

  5. Balte, A., Pise, N., Kulkarni, P.: Meta-learning with landmarking: A survey. Int. J. Comput. Appl. 105(8) (2014)

  6. Bardenet, R., Brendel, M., Kégl, B., Sebag, M.: Collaborative hyperparameter tuning. In: International conference on machine learning, pp. 199–207 (2013)

  7. Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. arXiv:1811.06128(2018)

  8. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Bertsimas, D., Stellato, B.: The voice of optimization. Mach. Learn., 1–29 (2020)

  10. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

  11. Blanchard, J.D., Tanner, J.: Performance comparisons of greedy algorithms in compressed sensing. Numer. Linear Algebr. 22(2), 254–282 (2015)

    Article  MathSciNet  Google Scholar 

  12. Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: Guaranteed stability and performance. IEEE J. Sel. Top. Signa. 4(2), 298–309 (2010)

    Article  Google Scholar 

  13. Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: van Hoeve, W. J. (ed.) Integration of Constraint Programming, Artificial Intelligence, and Operations Research - 15th International Conference, CPAIOR 2018. The Netherlands, June 26-29 Proceedings, volume 10848 of Lecture Notes in Computer Science, pp. 595–604. Springer, Delft (2018)

  14. Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)

    Article  Google Scholar 

  15. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009)

    Article  MathSciNet  Google Scholar 

  16. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE T. Inform. Theory 51, 4203–4215 (2005)

    Article  MathSciNet  Google Scholar 

  17. Davenport, M., Needell, D., Wakin, M.B.: Signal cosa space MP for sparse recovery with redundant dictionaries. IEEE T. Inform. Theory 59(10), 6820 (2012)

    Article  Google Scholar 

  18. De, S., Yadav, A., Jacobs, D., Goldstein, T.: Big batch SGD: Automated inference using adaptive batch sizes. arXiv:1610.05792 (2017)

  19. Défossez, A., Bach, F.: Adabatch: Efficient gradient aggregation rules for sequential and parallel stochastic gradient methods. arXiv:1711.01761 (2017)

  20. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Eggensperger, K., Lindauer, M., Hutter, F.: Neural networks for predicting algorithm runtime distributions. In: Lang, J. (ed.) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018. ijcai.org, pp 1442–1448. Stockholm, Sweden (2018)

  22. Eldar, Y.C., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University Press (2012)

  23. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Adv. Neur. In., pp. 2962–2970 (2015)

  24. Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)

    Article  MathSciNet  Google Scholar 

  25. Foucart, S., Rauhut, H.: A mathematical introduction to compressive sensing, vol. 1. Birkhäuser, Basel (2013)

  26. Gu, X., Needell, D., Tu, S.: On practical approximate projection schemes in signal space methods. SIAM Undergraduate Research Online 9, 422–434 (2016)

    Google Scholar 

  27. Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677 (2017)

  28. Gupta, R., Roughgarden, T.: A PAC approach to application-specific algorithm selection. SIAM J. Comput. 46(3), 992–1017 (2017)

    Article  MathSciNet  Google Scholar 

  29. Hansen, P.C.: Regularization tools: a MATLAB package for analysis and solution of discrete ill-posed problems. Numer. Algorithm. 6(1), 1–35 (1994)

    Article  MathSciNet  Google Scholar 

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. CVPR IEEE, pp. 770–778 (2016)

  31. He, Y., Yuen, S.Y.: Black box algorithm selection by convolutional neural network. arXiv:2001.01685 (2019)

  32. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings if CVPR IEEE, pp. 4700–4708 (2017)

  33. Khalil, E.B., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Guyon, I.I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp 6348–6358, Long Beach (2017)

  34. Khalil, E.B., Dilkina, B., Nemhauser, G.L., Ahmed, S., Shao, Y.: Learning to run heuristics in tree search. In: Proceedings Int Joint Conf. Artif., pp. 659–666 (2017)

  35. Kingma, D.P., Adam, J.B.a.: A method for stochastic optimization. arXiv:1412.6980 (2014)

  36. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neur. In., pp. 1097–1105 (2012)

  37. Kruber, M., Lu̇bbecke, M.E., Parmentier, A.: Learning when to use a decomposition. In: Salvagnin, D., Lombardi, M. (eds.) Integration of AI and OR Techniques in Constraint Programming - 14th International Conference, CPAIOR 2017. Proceedings, volume 10335 of Lecture Notes in Computer Science, pp. 202–210. Springer, Padua (2017)

  38. Lagoudakis, M.G., Littman, M.L.: Algorithm selection using reinforcement learning. In: Int. Conf. Mach. Learn., pp. 511–518 (2000)

  39. LeCun, Y., Cortes, C., Burges, C.: The MNIST database of handwritten digits. Available at http://yann.lecun.com/exdb/mnist/, Accessed: 21 Dec 2018 (2010)

  40. Leyton-Brown, K., Hoos, H.H., Hutter, F., Xu, L.: Understanding the empirical hardness of NP-complete problems. Commun. ACM 57(5), 98–107 (2014)

    Article  Google Scholar 

  41. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(185), 1–52 (2018)

    MathSciNet  MATH  Google Scholar 

  42. Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. In: Adv. Neur. In., pp. 181–189 (2015)

  43. Maleki, A., Donoho, D.L.: Optimally tuned iterative reconstruction algorithms for compressed sensing. IEEE J. Sel. Top Signa. 4(2), 330–341 (2010)

    Article  Google Scholar 

  44. Massé, P.-Y., Ollivier, Y.: Speed learning on the fly. arXiv:1511.02540 (2015)

  45. Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Adv. Neur. In., pp. 451–459 (2011)

  46. Needell, D., Tropp, J.: CosaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. A. 26(3), 301–321 (2009)

    Article  MathSciNet  Google Scholar 

  47. Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Adv. Neur. In., pp. 1017–1025 (2014)

  48. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-learning by landmarking various learning algorithms. In: ICML, pp. 743–750 (2000)

  49. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  50. Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61, 1025–1045 (2008)

    Article  MathSciNet  Google Scholar 

  51. Schaul, T., Zhang, S., LeCun, Y.: No more pesky learning rates. In: Int. Conf. Mach. Learn., pp. 343–351 (2013)

  52. Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization Convergence results and optimal averaging schemes. In: Int. Conf. Mach. Learn., pp. 71–79 (2013)

  53. Smith, K.A.: Neural networks for combinatorial optimization: a review of more than a decade of research. INFORMS J. Comput. 11(1), 15–34 (1999)

    Article  MathSciNet  Google Scholar 

  54. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of CVPR IEEE, pp. 1–9 (2015)

  55. Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Adv. Neur. In., pp. 685–693 (2016)

  56. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer (2015)

  57. Wu, X., Ward, R., Bottou, L.: WNGrad Learn the learning rate in gradient descent. arXiv:1803.02865 (2018)

  58. Yang, C., Akimoto, Y., Kim, D.W., Udell, M.: OBOE: Collaborative filtering for AutoML initialization. In: Proceedings of 25th ACM SIGKDD International Conf. Knowledge Discovery & Data Mining, pp. 1173–1183 (2019)

  59. Yang, Y., Zhong, Z., Shen, T., Lin, Z.: Convolutional neural networks with alternately updated clique. In: Proceedings of CVPR IEEE, pp. 2413–2422 (2018)

  60. Yao, Q., Wang, M., Chen, Y., Dai, W., Yi-Qi, H., Yu-Feng, L., Wei-Wei, T., Qiang, Y., Yang, Y.: Taking human out of learning applications A survey on automated machine learning. arXiv:1810.13306 (2018)

  61. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv:1212.5701 (2012)

Download references

Acknowledgments

Thismaterial was supported the National Science Foundation grant number DMS-1440140 while the authors were in residence at the Mathematical Science Research Institute in Berkeley, California, during the Fall 2017 semester. De Loera was funded by NSF DMS-1522158, NSF DMS-1818969, and NSF TRIPODS grant (NSF Award no. CCF-1934568). Needell was funded by NSF CAREER DMS-1348721 and NSF BIGDATA 1740325.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jamie Haddock.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Loera, J.A., Haddock, J., Ma, A. et al. Data-driven algorithm selection and tuning in optimization and signal processing. Ann Math Artif Intell 89, 711–735 (2021). https://doi.org/10.1007/s10472-020-09717-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-020-09717-z

Keywords

Mathematics Subject Classification (2010)

Navigation