Abstract
We study an inexact proximal stochastic gradient (IPSG) method for convex composite optimization, whose objective function is a summation of an average of a large number of smooth convex functions and a convex, but possibly nonsmooth, function. Variance reduction techniques are incorporated in the method to reduce the stochastic gradient variance. The main feature of this IPSG algorithm is to allow solving the proximal subproblems inexactly while still keeping the global convergence with desirable complexity bounds. Different subproblem stopping criteria are proposed. Global convergence and the component gradient complexity bounds are derived for the both cases when the objective function is strongly convex or just generally convex. Preliminary numerical experiment shows the overall efficiency of the IPSG algorithm.
Similar content being viewed by others
Notes
The datasets are available at http://www.gems-system.org.
References
Bauschke, H.H., Combettes, P.L.: A dykstra-like algorithm for two monotone operators. Pac. J. Optim. 4(3), 383–391 (2008)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont, Massachusetts (1999)
Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. Ser. B 129, 163–195 (2011)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. arXiv: 1507.01030v1, 3 July (2015)
Cai, J.-F., Candès, E.J., She, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Cai, J.-F., Candès, E.J., She, Z.: Fast newton-type methods for total variation regularization. In: ICML (2011)
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS, pp. 1646–1654 (2014)
Duchi, J., Hazan, E., Singe, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2015)
Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 54(12), 3736–3745 (2006)
Fadili, J., Peyrè, G.: Total variation projection with first order schemes. IEEE Trans. Image Process. 20(3), 657–669 (2011)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)
Hovhannisyan, V., Parpas, P., Zafeiriou, S.: Magma: multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. arXiv: 1509.05715v3, July (2016)
Jang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)
Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: ICML, pp. 487–494 (2010)
Johnson R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)
Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Konečný, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signal Process. 10(2), 242–255 (2016)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. 106(3), 697 (2009)
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. J. Mach. Learn. Res. 12, 2681–2720 (2011)
Nesterov, Y.E.: Gradient methods for minimizing composite objective function. Math. Program. Ser. B 140, 341–362 (2013)
Nitanda, A.: Stochastic proximal gradient descent with acceleration techniques. In: NIPS, pp. 1574–1582 (2014)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20(2), 231–252 (2009)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 4(19), 1167–1192 (2012)
Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. (2016). doi:10.1007/s10107-016-0997-3
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. (2016). doi:10.1007/s10107-016-1030-6
Schmidt, M., Le Roux, N., Bach, F.: Supplementary material for the paper convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS (2011)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
Tibshirani, R., Hastie, T., Friedman, J.: The elements of statistical learning: data mining, 2nd edn. In: Inference. Springer, New York (2009)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2011)
Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. 150(2), 321–363 (2015)
Wang, M., Bertsekas, D.P.: Stochastic first-order methods with random constraint projection. SIAM J. Optim. 26(1), 681–717 (2016)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)
Zălinescu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing Co. Inc., Singapore (2002)
Zhang, Y., Lin, X.: Stochastic primal-dual coordinate method for regularized empirical risk minimization. In: ICML (2015)
Zhu, Z.A., Yuan, Y.: Univr: A universal variance reduction framework for proximal stochastic gradient method. arXiv: 1506.01972v1, 5 June (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is partially supported by the National Natural Science Foundation of China 11301505 and the National Science Foundation of USA 1522654.
Rights and permissions
About this article
Cite this article
Wang, X., Wang, S. & Zhang, H. Inexact proximal stochastic gradient method for convex composite optimization. Comput Optim Appl 68, 579–618 (2017). https://doi.org/10.1007/s10589-017-9932-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-017-9932-7
Keywords
- Convex composite optimization
- Empirical risk minimization
- Stochastic gradient
- Inexact methods
- Global convergence
- Complexity bound