Efficient zeroth-order proximal stochastic method for nonconvex nonsmooth black-box problems

Kazemi, Ehsan; Wang, Liqiang

doi:10.1007/s10994-023-06409-7

Efficient zeroth-order proximal stochastic method for nonconvex nonsmooth black-box problems

Published: 20 November 2023

Volume 113, pages 97–120, (2024)
Cite this article

Machine Learning Aims and scope Submit manuscript

Ehsan Kazemi¹ &
Liqiang Wang¹

690 Accesses
1 Altmetric
Explore all metrics

Abstract

Proximal gradient method has a major role in solving nonsmooth composite optimization problems. However, in some machine learning problems related to black-box optimization models, the proximal gradient method could not be leveraged as the derivation of explicit gradients are difficult or entirely infeasible. Several variants of zeroth-order (ZO) stochastic variance reduced such as ZO-SVRG and ZO-SPIDER algorithms have recently been studied for nonconvex optimization problems. However, almost all the existing ZO-type algorithms suffer from a slowdown and increase in function query complexities up to a small-degree polynomial of the problem size. In order to fill this void, we propose a new analysis for the stochastic gradient algorithm for optimizing nonconvex, nonsmooth finite-sum problems, called ZO-PSVRG+ and ZO-PSPIDER+. The main goal of this work is to present an analysis that brings the convergence analysis for ZO-PSVRG+ and ZO-PSPIDER+ into uniformity, recovering several existing convergence results for arbitrary minibatch sizes while improving the complexity of their ZO oracle and proximal oracle calls. We prove that the studied ZO algorithms under Polyak-Łojasiewicz condition in contrast to the existent ZO-type methods obtain a global linear convergence for a wide range of minibatch sizes when the iterate enters into a local PL region without restart and algorithmic modification. The current analysis in the literature is mainly limited to large minibatch sizes, rendering the existing methods unpractical for real-world problems due to limited computational capacity. In the empirical experiments for black-box models, we show that the new analysis provides superior performance and faster convergence to a solution of nonconvex nonsmooth problems compared to the existing ZO-type methods as they suffer from small-level stepsizes. As a byproduct, the proposed analysis is generic and can be exploited to the other variants of gradient-free variance reduction methods aiming to make them more efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

Article 29 April 2022

A hybrid stochastic optimization framework for composite nonconvex optimization

Article 04 January 2021

Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization

Article 03 April 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Not applicable.

Code availability

Available.

Notes

https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/binary.html.

References

Allen-Zhu, Z., Yuan, Y. (2016). Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In: International conference on machine learning, pp 1080–1089.
Chen, P.Y., Zhang, H., Sharma, Y., et al. (2017). Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, ACM, pp. 15–26.
Chen, X., Liu, S., Xu, K., et al. (2019). Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization. In: Advances in Neural Information Processing Systems, pp 7202–7213.
Defazio, A., Bach, F., Lacoste-Julien, S. (2014). Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in neural information processing systems, pp 1646–1654.
Duchi, J. C., Jordan, M. I., Wainwright, M. J., et al. (2015). Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5), 2788–2806.
Article MathSciNet Google Scholar
Fang, C., Li, C.J., Lin, Z., et al. (2018). Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, pp 689–699.
Flaxman, A.D., Kalai, A.T., McMahan, H.B. (2005). Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, pp 385–394.
Gao, X., Jiang, B., & Zhang, S. (2018). On the information-adaptive variants of the admm: An iteration complexity perspective. Journal of Scientific Computing, 76(1), 327–363.
Article MathSciNet Google Scholar
Ghadimi, S., & Lan, G. (2013). Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4), 2341–2368.
Article MathSciNet Google Scholar
Ghadimi, S., & Lan, G. (2016). Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1–2), 59–99.
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G., & Zhang, H. (2016). Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1), 267–305.
Article MathSciNet Google Scholar
Gu B, Huo Z, Deng C, et al (2018a) Faster derivative-free stochastic algorithm for shared memory machines. In: International Conference on Machine Learning, pp 1807–1816
Gu, B., Wang, D., Huo, Z., et al. (2018b). Inexact proximal gradient methods for non-convex and non-smooth optimization. In: Thirty-Second AAAI Conference on Artificial Intelligence.
Hajinezhad, D., Hong, M., Garcia, A. (2019). Zone: Zeroth order nonconvex multi-agent optimization over networks. IEEE Transactions on Automatic Control.
Horváth, S., Richtárik, P. (2019). Nonconvex variance reduced optimization with arbitrary sampling. In: International Conference on Machine Learning, PMLR, pp 2781–2789.
Huang, F., Gu, B., Huo, Z., et al (2019). Faster gradient-free proximal stochastic methods for nonconvex nonsmooth optimization. In: AAAI.
Huang, F., Tao, L., Chen, S. (2020). Accelerated stochastic gradient-free and projection-free methods. arXiv preprint arXiv:2007.12625.
Ji, K., Wang, Z., Zhou, Y., et al. (2019). Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. In: International Conference on Machine Learning, pp 3100–3109.
Ji, K., Wang, Z., Weng, B., et al. (2020). History-gradient aided batch size adaptation for variance reduced algorithms. In: International Conference on Machine Learning, PMLR, pp 4762–4772.
Johnson, R., Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323.
Karimi, H., Nutini, J., Schmidt, M. (2016). Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 795–811.
Lei, L., Ju, C., Chen, J., et al. (2017). Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems, pp 2348–2358.
Li, Z., Li, J. (2018). A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Advances in Neural Information Processing Systems, pp 5564–5574.
Lian, X., Zhang, H., Hsieh, C.J., et al. (2016). A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order. In: Advances in Neural Information Processing Systems, pp 3054–3062.
Liu, L., Cheng, M., Hsieh, C.J., et al. (2018a). Stochastic zeroth-order optimization via variance reduction method. arXiv preprint arXiv:1805.11811.
Liu, S., Chen, J., Chen, P.Y., et al. (2017) Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. arXiv preprint arXiv:1710.07804.
Liu, S., Kailkhura, B., Chen, P.Y., et al. (2018b). Zeroth-order stochastic variance reduction for nonconvex optimization. Advances in Neural Information Processing Systems, pp 3727–3737.
Nesterov, Y., & Spokoiny, V. (2011). Random gradient-free minimization of convex functions. Tech. rep.: Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Nesterov, Y., & Spokoiny, V. (2017). Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2), 527–566.
Article MathSciNet Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., et al. (2017a). Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, PMLR, pp 2613–2621.
Nguyen, L.M., Liu, J., Scheinberg, K., et al. (2017b). Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261.
Nitanda, A. (2016). Accelerated stochastic gradient descent for minimizing finite sums. Artificial Intelligence and Statistics, pp 195–203.
Parikh, N., Boyd, S., et al. (2014). Proximal algorithms. Foundations and Trends ® in Optimization, 1(3), 127–239.
Article Google Scholar
Polyak, B. T. (1963). Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 3(4), 643–653.
MathSciNet Google Scholar
Reddi, S.J., Hefny, A., Sra, S., et al. (2016a). Stochastic variance reduction for nonconvex optimization. In: International conference on machine learning, pp 314–323.
Reddi, S.J., Sra, S., Póczos, B., et al. (2016b). Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Advances in Neural Information Processing Systems, pp 1145–1153.
Sahu AK, Zaheer M, Kar S (2019) Towards gradient free and projection free stochastic optimization. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp 3468–3477
Shamir, O. (2017). An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. Journal of Machine Learning Research, 18(52), 1–11.
MathSciNet Google Scholar
Wang, Z., Ji, K., Zhou, Y., et al. (2019). Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems, pp 2406–2416.
Wibisono, A., Wainwright, M.J., Jordan, M.I., et al (2012). Finite sample convergence rates of zero-order stochastic optimization methods. Advances in Neural Information Processing Systems, pp 1439–1447.
Xiao, L., & Zhang, T. (2014). A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4), 2057–2075.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments that improved the manuscript significantly.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science, University of Central Florida, Orlando, USA
Ehsan Kazemi & Liqiang Wang

Authors

Ehsan Kazemi
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

EK and LW contributed to the theoretical results and analysis; EK performed the empirical studies; EK and LW wrote the paper; All authors contributed to the revised manuscript.

Corresponding author

Correspondence to Ehsan Kazemi.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable

Additional information

Editor: Lam M Nguyen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 867 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kazemi, E., Wang, L. Efficient zeroth-order proximal stochastic method for nonconvex nonsmooth black-box problems. Mach Learn 113, 97–120 (2024). https://doi.org/10.1007/s10994-023-06409-7

Download citation

Received: 29 January 2022
Revised: 24 July 2023
Accepted: 30 September 2023
Published: 20 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10994-023-06409-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient zeroth-order proximal stochastic method for nonconvex nonsmooth black-box problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

A hybrid stochastic optimization framework for composite nonconvex optimization

Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 867 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient zeroth-order proximal stochastic method for nonconvex nonsmooth black-box problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

A hybrid stochastic optimization framework for composite nonconvex optimization

Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization

Explore related subjects

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 867 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation