Abstract
We introduce the notion of predicted decrease approximation (PDA) for constrained convex optimization, a flexible framework which includes as special cases known algorithms such as generalized conditional gradient, proximal gradient, greedy coordinate descent for separable constraints and working set methods for linear equality constraints with bounds. The new scheme allows the development of a unified convergence analysis for these methods. We further consider a partially strongly convex nonsmooth model and show that dual application of PDA-based methods yields new sublinear convergence rate estimates in terms of both primal and dual objectives. As an example of an application, we provide an explicit working set selection rule for SMO-type methods for training the support vector machine with an improved primal convergence analysis.



Similar content being viewed by others
References
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Beck, A.: The 2-coordinate descent method for solving double-sided simplex constrained minimization problems. J. Optim. Theory Appl. 162(3), 892–919 (2014)
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemess. SIAM J. Optim. 25(1), 185–209 (2015)
Beck, A., Pauwels, E., Sabach, S.: The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim. 25(4), 2024–2049 (2015)
Beck, A., Teboulle, M.: A conditional gradient method with linear rate of convergence for solving convex linear systems. Math. Methods Oper. Res. 59(2), 235–247 (2004)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Canon, M.D., Cullum, C.D.: A tight upper bound on the rate of convergence of Frank–Wolfe algorithm. SIAM J. Control 6(4), 509–516 (1968)
Chang, C.-C., Hsu, C.-W., Lin, C.-J.: The analysis of decomposition methods for support vector machines. IEEE Trans. Neural Netw. 11(4), 1003–1008 (2000)
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dem’yanov, V.F., Rubinov, A.M.: The minimization of a smooth convex functional on a convex set. SIAM J. Control 5(2), 280–294 (1967)
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. Adv. Neural Inf. Process. Syst. 9, 155–161 (1997)
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2), 95–110 (1956)
Hush, D., Kelly, P., Scovel, C., Steinwart, I.: QP algorithms with guaranteed accuracy and run time for support vector machines. J. Mach. Learn. Res. 7, 733–769 (2006)
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), vol. 28, pp. 427–435 (2013)
Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods, pp. 169–184. MIT Press, Cambridge (1999)
Karloff, H.: Linear Programming. Progress in Theoretical Computer Science. Birkhäuser Boston Inc, Boston (1991)
Korte, B., Vygen, J.: Combinatorial Optimization. Springer, Berlin (2002)
Lacoste-Julien, S., Jaggi, M.: An affine invariant linear convergence analysis for Frank–Wolfe algorithms. In: NIPS 2013 Workshop on Greedy Algorithms, Frank–Wolfe and Friends (2014)
Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate Frank–Wolfe optimization for structural SVMs. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), vol. 28, pp. 53–61 (2013)
Lacoste-Julien, S., Schmidt, M., Bach, F.: A simpler approach to obtaining an o (1/t) convergence rate for the projected stochastic subgradient method. arXiv preprint arXiv:1212.2002 (2012)
Levitin, E.S., Poljak, B.T.: Minimization methods in the presence of constraints. USSR Comput. Math. Math. Phys. 6(5), 787–823 (1966)
List, N., Hush, D., Scovel, C., Steinwart, I.: Gaps in support vector optimization. In: Bshouty, N., Gentile, C. (eds.) Proceedings of the 20th conference on learning theory, pp. 336–348. Springer, Berlin (2007)
List, N., Simon, H.U.: General polynomial time decomposition algorithms. J. Mach. Learn. Res. 8, 303–321 (2007)
Moreau, J.J.: Proximitéet dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 93, 273–299 (1965)
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop, pp. 276–285. IEEE (1997)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, vol. 3, pp. 185–208 (1999)
Rockafellar, R .T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Simon, H.U.: On the complexity of working set selection. Theoret. Comput. Sci. 382(3), 262–279 (2007)
Strang, G.: Linear Algebra and Its Applications, 2nd edn. Academic Press, New York (1980)
Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Weston, J., Watkins, C.; Multi-class support vector machines. Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London (1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
The research of Amir Beck was partially supported by the Israel Science Foundation Grant 1821/16. The research of Edouard Pauwels was partially sponsored by a grant from the Air Force Office of Scientific Research, Air Force Material Command (Grant No. FA9550-15-1-0500). Most of this work took place during Edouard Pauwels postdoctoral stay at the Technion, Haifa, Israel.
Rights and permissions
About this article
Cite this article
Beck, A., Pauwels, E. & Sabach, S. Primal and dual predicted decrease approximation methods. Math. Program. 167, 37–73 (2018). https://doi.org/10.1007/s10107-017-1108-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1108-9