Data-driven algorithm selection and tuning in optimization and signal processing

De Loera, Jesús A.; Haddock, Jamie; Ma, Anna; Needell, Deanna

doi:10.1007/s10472-020-09717-z

Data-driven algorithm selection and tuning in optimization and signal processing

Published: 12 November 2020

Volume 89, pages 711–735, (2021)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Jesús A. De Loera¹,
Jamie Haddock ORCID: orcid.org/0000-0002-1449-2574²,
Anna Ma³ &
…
Deanna Needell²

232 Accesses
Explore all metrics

Abstract

Machine learning algorithms typically rely on optimization subroutines and are well known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimization and signal processing algorithms. As a proof of concept, we use our approach to improve two popular data processing subroutines in data science: stochastic gradient descent and greedy methods in compressed sensing. We provide experimental results that demonstrate the answer is “yes”, machine learning algorithms do lead to more effective outcomes for optimization problems, and show the future potential for this research direction. In addition to our experimental work, we prove relevant Probably Approximately Correct (PAC) learning theorems for our problems of interest. More precisely, we show that there exists a learning algorithm that, with high probability, will select the algorithm that optimizes the average performance on an input set of problem instances with a given distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative study of multi-objective optimization algorithms for sparse signal reconstruction

Article 06 October 2021

Murat Emre Erkoc & Nurhan Karaboga

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Article 18 March 2016

Massimo Fornasier, Steffen Peter, … Stephan Worm

Sparse classification: a scalable discrete optimization perspective

Article 02 November 2021

Dimitris Bertsimas, Jean Pauphilet & Bart Van Parys

References

Alvarez, A.M., Louveaux, Q., Wehenkel, L.: A machine learning-based approximation of strong branching. INFORMS J. Comput. 29(1), 185–195 (2017)
Article MathSciNet Google Scholar
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., De Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Adv. Neur. In., pp. 3981–3989 (2016)
Balcan, M., Dick, T., Sandholm, T., Vitercik, E.: Learning to branch. In: Int. Conf. Mach. Learn., pp. 353–362 (2018)
Balcan, M., Nagarajan, V., Vitercik, E., White, C.: Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In: Proc. Conf. Learn. Th., pp. 213–274 (2017)
Balte, A., Pise, N., Kulkarni, P.: Meta-learning with landmarking: A survey. Int. J. Comput. Appl. 105(8) (2014)
Bardenet, R., Brendel, M., Kégl, B., Sebag, M.: Collaborative hyperparameter tuning. In: International conference on machine learning, pp. 199–207 (2013)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. arXiv:1811.06128(2018)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)
MathSciNet MATH Google Scholar
Bertsimas, D., Stellato, B.: The voice of optimization. Mach. Learn., 1–29 (2020)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
Blanchard, J.D., Tanner, J.: Performance comparisons of greedy algorithms in compressed sensing. Numer. Linear Algebr. 22(2), 254–282 (2015)
Article MathSciNet Google Scholar
Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: Guaranteed stability and performance. IEEE J. Sel. Top. Signa. 4(2), 298–309 (2010)
Article Google Scholar
Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: van Hoeve, W. J. (ed.) Integration of Constraint Programming, Artificial Intelligence, and Operations Research - 15th International Conference, CPAIOR 2018. The Netherlands, June 26-29 Proceedings, volume 10848 of Lecture Notes in Computer Science, pp. 595–604. Springer, Delft (2018)
Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)
Article Google Scholar
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009)
Article MathSciNet Google Scholar
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE T. Inform. Theory 51, 4203–4215 (2005)
Article MathSciNet Google Scholar
Davenport, M., Needell, D., Wakin, M.B.: Signal cosa space MP for sparse recovery with redundant dictionaries. IEEE T. Inform. Theory 59(10), 6820 (2012)
Article Google Scholar
De, S., Yadav, A., Jacobs, D., Goldstein, T.: Big batch SGD: Automated inference using adaptive batch sizes. arXiv:1610.05792 (2017)
Défossez, A., Bach, F.: Adabatch: Efficient gradient aggregation rules for sequential and parallel stochastic gradient methods. arXiv:1711.01761 (2017)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Eggensperger, K., Lindauer, M., Hutter, F.: Neural networks for predicting algorithm runtime distributions. In: Lang, J. (ed.) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018. ijcai.org, pp 1442–1448. Stockholm, Sweden (2018)
Eldar, Y.C., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University Press (2012)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Adv. Neur. In., pp. 2962–2970 (2015)
Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)
Article MathSciNet Google Scholar
Foucart, S., Rauhut, H.: A mathematical introduction to compressive sensing, vol. 1. Birkhäuser, Basel (2013)
Gu, X., Needell, D., Tu, S.: On practical approximate projection schemes in signal space methods. SIAM Undergraduate Research Online 9, 422–434 (2016)
Google Scholar
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677 (2017)
Gupta, R., Roughgarden, T.: A PAC approach to application-specific algorithm selection. SIAM J. Comput. 46(3), 992–1017 (2017)
Article MathSciNet Google Scholar
Hansen, P.C.: Regularization tools: a MATLAB package for analysis and solution of discrete ill-posed problems. Numer. Algorithm. 6(1), 1–35 (1994)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. CVPR IEEE, pp. 770–778 (2016)
He, Y., Yuen, S.Y.: Black box algorithm selection by convolutional neural network. arXiv:2001.01685 (2019)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings if CVPR IEEE, pp. 4700–4708 (2017)
Khalil, E.B., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Guyon, I.I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp 6348–6358, Long Beach (2017)
Khalil, E.B., Dilkina, B., Nemhauser, G.L., Ahmed, S., Shao, Y.: Learning to run heuristics in tree search. In: Proceedings Int Joint Conf. Artif., pp. 659–666 (2017)
Kingma, D.P., Adam, J.B.a.: A method for stochastic optimization. arXiv:1412.6980 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neur. In., pp. 1097–1105 (2012)
Kruber, M., Lu̇bbecke, M.E., Parmentier, A.: Learning when to use a decomposition. In: Salvagnin, D., Lombardi, M. (eds.) Integration of AI and OR Techniques in Constraint Programming - 14th International Conference, CPAIOR 2017. Proceedings, volume 10335 of Lecture Notes in Computer Science, pp. 202–210. Springer, Padua (2017)
Lagoudakis, M.G., Littman, M.L.: Algorithm selection using reinforcement learning. In: Int. Conf. Mach. Learn., pp. 511–518 (2000)
LeCun, Y., Cortes, C., Burges, C.: The MNIST database of handwritten digits. Available at http://yann.lecun.com/exdb/mnist/, Accessed: 21 Dec 2018 (2010)
Leyton-Brown, K., Hoos, H.H., Hutter, F., Xu, L.: Understanding the empirical hardness of NP-complete problems. Commun. ACM 57(5), 98–107 (2014)
Article Google Scholar
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(185), 1–52 (2018)
MathSciNet MATH Google Scholar
Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. In: Adv. Neur. In., pp. 181–189 (2015)
Maleki, A., Donoho, D.L.: Optimally tuned iterative reconstruction algorithms for compressed sensing. IEEE J. Sel. Top Signa. 4(2), 330–341 (2010)
Article Google Scholar
Massé, P.-Y., Ollivier, Y.: Speed learning on the fly. arXiv:1511.02540 (2015)
Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Adv. Neur. In., pp. 451–459 (2011)
Needell, D., Tropp, J.: CosaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. A. 26(3), 301–321 (2009)
Article MathSciNet Google Scholar
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Adv. Neur. In., pp. 1017–1025 (2014)
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-learning by landmarking various learning algorithms. In: ICML, pp. 743–750 (2000)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61, 1025–1045 (2008)
Article MathSciNet Google Scholar
Schaul, T., Zhang, S., LeCun, Y.: No more pesky learning rates. In: Int. Conf. Mach. Learn., pp. 343–351 (2013)
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization Convergence results and optimal averaging schemes. In: Int. Conf. Mach. Learn., pp. 71–79 (2013)
Smith, K.A.: Neural networks for combinatorial optimization: a review of more than a decade of research. INFORMS J. Comput. 11(1), 15–34 (1999)
Article MathSciNet Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of CVPR IEEE, pp. 1–9 (2015)
Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Adv. Neur. In., pp. 685–693 (2016)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer (2015)
Wu, X., Ward, R., Bottou, L.: WNGrad Learn the learning rate in gradient descent. arXiv:1803.02865 (2018)
Yang, C., Akimoto, Y., Kim, D.W., Udell, M.: OBOE: Collaborative filtering for AutoML initialization. In: Proceedings of 25th ACM SIGKDD International Conf. Knowledge Discovery & Data Mining, pp. 1173–1183 (2019)
Yang, Y., Zhong, Z., Shen, T., Lin, Z.: Convolutional neural networks with alternately updated clique. In: Proceedings of CVPR IEEE, pp. 2413–2422 (2018)
Yao, Q., Wang, M., Chen, Y., Dai, W., Yi-Qi, H., Yu-Feng, L., Wei-Wei, T., Qiang, Y., Yang, Y.: Taking human out of learning applications A survey on automated machine learning. arXiv:1810.13306 (2018)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv:1212.5701 (2012)

Download references

Acknowledgments

Thismaterial was supported the National Science Foundation grant number DMS-1440140 while the authors were in residence at the Mathematical Science Research Institute in Berkeley, California, during the Fall 2017 semester. De Loera was funded by NSF DMS-1522158, NSF DMS-1818969, and NSF TRIPODS grant (NSF Award no. CCF-1934568). Needell was funded by NSF CAREER DMS-1348721 and NSF BIGDATA 1740325.

Author information

Authors and Affiliations

University of California, Davis, CA, USA
Jesús A. De Loera
University of California, Los Angeles, CA, USA
Jamie Haddock & Deanna Needell
University of California, Irvine, CA, USA
Anna Ma

Authors

Jesús A. De Loera
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Haddock
View author publications
You can also search for this author in PubMed Google Scholar
Anna Ma
View author publications
You can also search for this author in PubMed Google Scholar
Deanna Needell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jamie Haddock.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Loera, J.A., Haddock, J., Ma, A. et al. Data-driven algorithm selection and tuning in optimization and signal processing. Ann Math Artif Intell 89, 711–735 (2021). https://doi.org/10.1007/s10472-020-09717-z

Download citation

Accepted: 27 October 2020
Published: 12 November 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10472-020-09717-z

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven algorithm selection and tuning in optimization and signal processing

Abstract

Access this article

Similar content being viewed by others

A comparative study of multi-objective optimization algorithms for sparse signal reconstruction

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Sparse classification: a scalable discrete optimization perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Data-driven algorithm selection and tuning in optimization and signal processing

Abstract

Access this article

Similar content being viewed by others

A comparative study of multi-objective optimization algorithms for sparse signal reconstruction

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Sparse classification: a scalable discrete optimization perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation