Abstract
In this paper, a block mirror stochastic gradient method is developed to solve stochastic optimization problems involving convex and nonconvex cases, where the feasible set and the variables are treated as multiple blocks. The proposed method combines the features of the classic mirror descent stochastic method and the block coordinate gradient descent method. Acquiring the stochastic gradient information by stochastic oracles, our method updates all the blocks of variables in the Gauss–Seidel type. We establish the convergence for both convex and nonconvex cases. The analysis of our method is challenging because the typical unbiasedness assumption of stochastic gradient fails to hold in the Gauss–Seidel renewal type and requires more specific assumptions. The proposed algorithm is tested on the conditional value-at-risk problem and the stochastic LASSO problem to demonstrate the efficiency of our algorithm.
Similar content being viewed by others
Notes
Downloaded from http://www.resset.cn/.
Downloaded from https://archive.ics.uci.edu/ml/datasets/.
References
Allen-Zhu, Z.: Katyusha: the first direct acceleration of stochastic gradient methods. J. Mach. Learn. Res. 18, 8194–8244 (2017)
Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. Comput. Math. Math. Phys. 7, 200–217 (1967)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)
Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics (2017)
Buza, K.: Feedback prediction for blogs. In: Data Analysis, Machine Learning and Knowledge Discovery, pp. 145–152. Springer (2014)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. 2, 1646–1654 (2014)
Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25, 856–881 (2015)
Fu, M: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14, 192–215 (2002)
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34, A1380–A1405 (2012)
Glasserman, P.: Gradient Estimation via Perturbation Analysis. Kluwer, Boston (2003)
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77 (2011)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Hildreth, C.: A quadratic programming procedure. Nav. Res. Logist. Q. 4, 79–85 (1957)
Juditsky, A., Nemirovski, A.S.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optim. Mach. Learn. 30(9), 121–148 (2011)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133, 365–397 (2012)
Lan, G.: Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math. Program. 149, 1–45 (2015)
Lan, G.: First-Order and Stochastic Optimization Methods for Machine Learning. Springer (2020)
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. Adv. Neural Inf. Process. Syst. 2, 3384–3392 (2015)
Linderoth, J., Shapiro, A., Wright, S.: The empirical behavior of sampling methods for stochastic programming. Ann. Oper. Res. 142, 215–241 (2006)
Lu. Z., Xiao. L.: Randomized block coordinate non-monotone gradient method for a class of nonlinear programming. arXiv preprint. arXiv:1306.5918 (2013)
Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. A. SIAM J. Optim. 19, 1574–1609 (2009)
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč. M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2613–2621 (2017)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: an efficient algorithmic framework for stochastic composite nonconvex optimization. arXiv preprint, arXiv:1902.05679 (2019)
Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Program. Comput. 5, 201–226 (2013)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rockafellar, R., Wets, J.: Variational Analysis. Springer, Berlin (1998)
Rubinstein, R.Y., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley (1993)
Shapiro, A., Nemirovski, A.: On complexity of stochastic programming problems. In: Jeyakumar, V., Rubinov, A.M. (eds.) Continuous Optimization: Current Trends and Applications, pp. 111–144. Springer, New York (2005)
Sospedra, J.T., Montoliu, R., et al.: UJIIndoorLoc: a new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems. In: Proceedings of the Fifth International Conference on Indoor Positioning and Indoor Navigation (2014)
Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83–112 (2017)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105–145 (2016)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter. A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127, 3–30 (2011)
Sun, R., Ye, Y.: Worst-case complexity of cyclic coordinate descent: \(O (n^2)\) gap with randomized version. Math. Program. 185, 487–520 (2021)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Verweij, B., Ahmed, S., Kleywegt, A.J., Nemhauser, G., Shapiro, A.: The sample average approximation method applied to stochastic routing problems: a computational study. Comput. Optim. Appl. 24, 289–333 (2003)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)
Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25, 1686–1716 (2015)
Acknowledgements
The authors would like to thank Prof. Chengbo Yang for discussing the conditional value-at-risk problem in Sect. 4. The authors also sincerely thank the anonymous referees for their valuable comments and suggestions, which helped improve the manuscript significantly.
Funding
The research is partly supported by the National Key Research and Development Program of China (2020YFA0714101), NSFC (11701210, 11601183, 61872162, 12171199), the Education Department Project of Jilin Province (JJKH20211031KJ), the Science and Technology Department of Jilin Province (20180520212JH, 20190103029JH, 20200201269JC, 20210201015GX), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All the authors declare they have no financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs of some lemmas in this paper
Appendix A: Proofs of some lemmas in this paper
Proof of Lemma 1
From the strong convexity of \(\omega _{i}\), it holds
where the second inequality follows from (6). It implies
Let us consider the function \(\varphi _{\textbf{x}_{i}}(\textbf{z})=\omega _{i}(\textbf{z}) -\langle \nabla \omega _{i}(\textbf{x}_{i}),\textbf{z}\rangle \). It is easy to see that \(\varphi _{\textbf{x}_{i}}(\textbf{z})\) is the strongly convex function with the same parameter \(\alpha _{i}\) since
Then we have
which gives the result. \(\square \)
Proof of Lemma 2
Note that \(\sum _{t=1}^{T-1}\textbf{G}_{i}(\textbf{x}^{k},\xi _{t}^{k}) - \textbf{g}_{i}(\textbf{x}^{k})\) is independent of \(\textbf{G}_{i}(\textbf{x}^{k},\xi _{T}^{k}) - \textbf{g}_{i}(\textbf{x}^{k})\) on \(\{\xi _{t}^{k}\}_{t=1}^{T-1}\) for any \(T \ge 2\). This together with (2) yields that
Then for Euclidean norm \(\Vert \cdot \Vert \), we have
Under Assumption 1, it implies that
Then the equivalence relation between the norm \(\Vert \cdot \Vert _{\mathcal {E}_i,*}\) and the norm \(\Vert \cdot \Vert \) on \({\mathbb {R}}^{n_i}\) completes the proof. \( \square \)
Proof of Lemma 3
By Remark 2, it can be seen from (10), (11), and (14) that for any i and k
Also it holds by Lemma 2 that
We complete the proof of Lemma 3. \(\square \)
Proof of Lemma 4
By (8) and the optimality of \(\textbf{x}_{i}^{k+1}\), we have
Because of the strong convexity of \(\omega _{i}\), we obtain
On the other side, it holds that by Assumption 3
Then, we obtaind that
In addition, under Assumption 2, for any \(\xi _{t}^{k} \in \Xi ^{k} \subseteq \Xi \), it holds that
Note that
then by the uncorrelated condition, we have
By using the above observation and an auxiliary notation \(\Lambda \) denoted by
we get that
Also, observe that
and by the same argument in the proof of Lemma 3 and the definition of \(\Lambda \), it implies that
Moreover, it is clear that by Assumption 2
which, together with Lemma 3, gives that
Furthermore, by the above observation, we conclude that
completing the proof. \(\square \)
Proof of Lemma 7
Let \(\textbf{x}_{\textbf{g}_{i}}^{k+1} =\textbf{x}_{i}^{k}-\gamma _{i}^{k}\mathcal {G}_{i}\left( \textbf{x}_{i}^{k},\textbf{g}(\textbf{x}^{k}),\gamma _{i}^{k} \right) =P_{i}\left( \textbf{x}_{i}^{k},\textbf{g}(\textbf{x}^{k}),\gamma _{i}^{k} \right) \). By the optimality condition of (48), and the definition of \(\textbf{x}_{i}^{k+1}\) and \(\textbf{x}_{\textbf{g}_{i}}^{k+1}\), we have
and
respectively. Summing up the above two inequalities, one has
where the second inequality follows from the strong convexity of \(\omega _{i}\). Then, it holds that
Using the above relation, we obtain
Hence, we have
and obtain the result as required in (51).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, J., Song, H., Li, X. et al. Block Mirror Stochastic Gradient Method For Stochastic Optimization. J Sci Comput 94, 69 (2023). https://doi.org/10.1007/s10915-023-02110-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-023-02110-y