Abstract
DC (difference of convex functions) programming and DC algorithm (DCA) are powerful tools for nonsmooth nonconvex optimization. This field was created in 1985 by Pham Dinh Tao in its preliminary state, then the intensive research of the authors of this paper has led to decisive developments since 1993, and has now become classic and increasingly popular worldwide. For 35 years from their birthday, these theoretical and algorithmic tools have been greatly enriched, thanks to a lot of their applications, by researchers and practitioners in the world, to model and solve nonconvex programs from many fields of applied sciences. This paper is devoted to key open issues, recent advances and trends in the development of these tools to meet the growing need for nonconvex programming and global optimization. We first give an outline in foundations of DC programming and DCA which permits us to highlight the philosophy of these tools, discuss key issues, formulate open problems, and bring relevant answers. After outlining key open issues that require deeper and more appropriate investigations, we will present recent advances and ongoing works in these issues. They turn around novel solution techniques in order to improve DCA’s efficiency and scalability, a new generation of algorithms beyond the standard framework of DC programming and DCA for large-dimensional DC programs and DC learning with Big data, as well as for broader classes of nonconvex problems beyond DC programs.
Similar content being viewed by others
References
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1), 23–46 (2005)
Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. Special Issue DC Program. Theory Algorithms Appl. 169(1), 5–68 (2018)
Pham Dinh, T., Le Thi, H.A.: D.C. optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Nguyen, N.-T., Le-Thi, H. (eds.) Transactions on Computational Intelligence XIII. Lecture Notes in Computer Science, vol. 8342, pp. 1–37. Springer, Berlin (2014)
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithm and applications. Acta Math. Vietnam 22(1), 289–355 (1997)
Hartman, P.: On functions representable as a difference of convex functions. Pac. J. Math. 9(3), 707–713 (1959)
Pham Dinh, T., Souad, E.B.: Algorithms for solving a class of nonconvex optimization problems. Methods of subgradients. In: Hiriart-Urruty, J.-B. (ed.) Fermat Days 85: Mathematics for Optimization. North-Holland Mathematics Studies, vol. 129, pp. 249–271. North-Holland, Amsterdam (1986)
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer, Heidelberg (1996)
Horst, R., Pardalos, P.M., Thoai, N.V.: Introduction to Global Optimization. Springer, New York (1995)
Horst, R., Thoai, N.V.: DC programming: overview. J. Optim. Theory Appl. 103(1), 1–43 (1999)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Convergence analysis of DCA with subanalytic data. J. Optim. Theory Appl. 179, 103–126 (2018)
Pang, J.-S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1), 95–118 (2017)
Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J. Global Optim. 52(3), 509–535 (2012)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Error bounds via exact penalization with applications to concave and quadratic systems. J. Optim. Theory Appl. 171(1), 228–250 (2016)
Le Thi, H.A.: An efficient algorithm for globally minimizing a quadratic function under convex quadratic constraints. Math. Program. 87, 401–426 (2000)
Le Thi, H.A., Phan, D.N., Pham Dinh, T.: Advanced Difference of Convex functions Algorithms for Nonconvex Programming (submitted) (2021)
Le Thi, H.A., Phan, D.N., Pham Dinh, T.: Extended DCA based Algorithms for Nonconvex Programming (submitted) (2021)
Polyak, B.: Introduction to Optimization. Optimization Software Inc, New York (1987)
Chambolle, A., Devore, R.A., Lee, N.Y., Lucier, B.J.: Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEETrans Image Process 7, 319–335 (1998)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Elsevier, San Diego (1970)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th International Conference on Machine Learning, pp. 82–90. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1998)
Yuille, A.L., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B Met. 39(1), 1–38 (1977)
Sun, W., Sampaio, R.J.B., Candido, M.A.B.: Proximal point algorithm for minimization of DC function. J. Comput. Math. 21, 451–462 (2003)
Razaviyayn, M.: Successive convex approximation: analysis and applications. Ph.D. thesis, University of Minnesota (2014)
Razaviyayn, M., Hong, M., Luo, Z.-Q., Pang, J.S.: Parallel successive convex approximation for nonsmooth nonconvex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.d., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 27, pp. 1440–1448. Curran Associates, Inc., Montreal (2014)
Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.S.: Decomposition by partial linearization: parallel optimization of multi-agent systems. IEEE Trans. Signal Process. 62(3), 641–656 (2014)
Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization-part I: theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2017)
Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization-part II: applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)
Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, vol. 28. Atlanta, GA, USA, pp. 37–45 (2013)
Rakotomamonjy, A., Flamary, R., Gasso, G.: Dc proximal newton for nonconvex optimization problems. IEEE Trans. Neural Netw. Learn. Syst. 27(3), 636–647 (2016)
Le, H.M., Ta, M.T.: DC programming and DCA for solving minimum sum-of-squares clustering using weighted dissimilarity measures. In: Transactions on Computational Intelligence XIII. LNCS, vol. 8342, pp. 113–131. Springer, Berlin (2014)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC programming and DCA for general DC programs. In: van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 15–35. Springer, Cham (2014)
Solodov, M.V.: On the sequential quadratically constrained quadratic programming methods. Math. Oper. Res. 29(1), 64–79 (2004)
Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Novel DCA based algorithms for a special class of nonconvex problems with application in machine learning. Appl. Math. Comput. 409, 1–22 (2021)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\cal{O} (1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)
Phan, D.N., Le, H.M., Le Thi, H.A.: Accelerated difference of convex functions algorithm and its application to sparse binary logistic regression. In: 27th International Joint Conference on Artificial Intelligence and 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), Stockholm, Sweden, pp. 1369–1375 (2018)
Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai-Borwein gradient method. Comput. Optim. Appl. 23(2), 143–169 (2002)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
de Oliveira, W., Tcheou, M.P.: An inertial algorithm for dc programming. Set-Valued Var. Anal. 27(4), 895–919 (2019)
Phan, D.N., Le Thi, H.A.: DCA based Algorithm with Extrapolation for Nonconvex Nonsmooth Optimization (Submitted) (2021)
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12(8) (1981)
Aragón Artacho, F., Fleming, R.M.T., Phan, T.V.: Accelerating the DC algorithm for smooth functions. Math. Program. 169(1), 95–118 (2018)
Aragón Artacho, F.J., Phan, T.V.: The boosted difference of convex functions algorithm for nonsmooth functions. SIAM J. Optim. 30(1), 980–1006 (2020)
Niu, Y.-S., Wang, Y.-J., Le Thi, H.A., Pham Dinh, T.: Higher-order Moment Portfolio Optimization via The Difference-of-Convex Programming and Sums-of-Squares (submitted) (2021)
Le Thi, H.A., Vu, V.H.K.: Accelerated Difference of Convex functions Algorithms: a comparative study on two approaches and applications in Machine Learning. Technical report, University of Lorraine (2021)
Le Thi, H.A., Pham Dinh, T.: D.C. programming approach to the multidimensional scaling problem. In: Migdalas, A., Pardalos, P.M., Värbrand, P. (eds.) From Local to Global Optimization, pp. 231–276. Springer, Boston (2001)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 377–387 (2015)
Yao, Q., Kwok, J.T., Gao, F., Chen, W., Liu, T.Y.: Efficient inexact proximal gradient algorithm for nonconvex problems. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3308–3314 (2017)
Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 69(2), 297–324 (2018)
Lu, Z., Zhou, Z., Sun, Z.: Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Program. 176(1), 369–401 (2019)
Lu, Z., Zhou, Z.: Nonmonotone Enhanced Proximal DC Algorithms for a Class of Structured Nonsmooth DC Programming. SIAM J. Optim. 29, 2725–2752 (2019)
Yu, P., Pong, T.K.: Iteratively reweighted \(\ell _1\) algorithms with extrapolation. Comput. Optim. Appl. 73, 353–386 (2019)
Tsiligkaridis, T., Marcheret, E., Goel, V.: A difference of convex functions approach to large-scale log-linear model estimation. IEEE Trans. Audio Speech Lang. Process. 21(11), 2255–2266 (2013)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Image Sci. 2, 183–202 (2009)
Ackooij, W., de Oliveira, W.: Nonsmooth and nonconvex optimization via approximate difference-of-convex decompositions. J. Optim. Theory Appl. 182, 49–80 (2019)
Le Thi, H.A., Phan, D.N., Le, H.M.: DCA-Like and its accelerated scheme for a class of structured Nonconvex Optimization Problems (Submitted) (2021)
Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. J. Global Optim. 11(3), 253–285 (1997)
Pham Dinh, T., Nguyen Canh, N., Le Thi, H.A.: An efficient combination of DCA and B &B using DC/SDP relaxation for globally solving binary quadratic programs. J. Global Optim. 48(4), 595–632 (2010)
Hiriart-Urruty, J.-B., Lemarechal, C.: Convex Analysis and Minimization Algorithms, Parts I & II. Springer, Berlin (1993)
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series. Princeton University Press, Princeton (1970)
Le Thi, H.A., Ho, V.T.: Online learning based on online DCA and application to online classification. Neural Comput. 32(4), 759–793 (2020)
Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer, Berlin (1985)
Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3394–3403. PMLR, Sydney, NSW, Australia (2017)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Le Thi, H.A., Luu, H.P.H., Le, H.M., Pham Dinh, T.: Stochastic DCA with variance reduction and applications in machine learning. J. Mach. Learn. Res. 23(206), 1–44 (2022)
Liu, J., Cui, Y., Pang, J.S., Sen, S.: Two-stage stochastic programming with linearly bi-parameterized quadratic recourse. SIAM J. Optim. 30(3), 2530–2558 (2020)
Nitanda, A., Suzuki, T.: Stochastic Difference of convex algorithm and its application to training deep boltzmann machines. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 470–478. PMLR, Florida, USA (2017)
Xu, Y., Qi, Q., Lin, Q., Jin, R., Yang, T.: Stochastic optimization for DC functions and non-smooth non-convex regularizers with non-asymptotic convergence. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6942–6951. PMLR, California, USA (2019)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T., Luu, H.P.H.: Stochastic difference-of-convex algorithms for solving nonconvex optimization problems. SIAM J. Optim. 32(3), 2263–2293 (2022)
Le Thi, H.A., Pham Dinh, T., Luu, H.P.H., Le, H.M.: Deterministic and stochastic DCA for DC programming. In: Handbook of Engineering Statistics, 2nd edn. Springer, Cham (2021) (in press)
Le Thi, H.A., Luu, H.P.H., Pham Dinh, T.: Online stochastic DCA with applications to principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. (in press) (2022)
Le Thi, H.A., Pham Dinh, T.: A continuous approach for globally solving linearly constrained quadratic zero-one programming problems. Optimization 50(1–2), 93–120 (2001)
Le Thi, H.A., Pham Dinh, T., Thoai, N.V., Nguyen Canh, N.: D.C. optimization techniques for solving a class of nonlinear bilevel programs. J. Global Optim. 44(3), 313–337 (2009)
Le Thi, H.A., Pham Dinh, T., Le, D.M.: Numerical solution for optimization over the efficient set by DC optimization algorithms. Oper. Res. Lett. 19(3), 117–128 (1996)
Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Simplicially constrained D.C. optimization over the efficient and weakly efficient sets. J. Optim. Theory Appl. 117(3), 503–521 (2003)
Le Thi, H.A., Pham Dinh, T., Thoai, N.V.: Combination between global and local methods for solving an optimization problem over the efficient set. Eur. J. Oper. Res. 142(2), 258–270 (2002)
Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput. 34(1), 39–60 (1989)
Pham Dinh, T., Le Thi, H.A., Pham, V.N., Niu, Y.-S.: DC programming approaches for discrete portfolio optimization under concave transaction costs. Optim. Lett. 10(2), 261–282 (2016)
Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. J. Adv. Data Anal. Classif. 2(3), 259–278 (2008)
Le Thi, H.A., Nguyen, V.V., Ouchani, S.: Gene selection for cancer classification using DCA. J. Front. Comput. Sci. Technol. 3(6), 612–620 (2009)
Ong, C.S., Le Thi, H.A.: Learning sparse classifiers with difference of convex functions algorithms. Optim. Methods Softw. 28(4), 830–854 (2013)
Thiao, M., Pham Dinh, T., Le Thi, H.A.: A DC programming approach for sparse eigenvalue problem. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning, pp. 1063–1070. Omnipress, Haifa, Israel (2010)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach. Learn. 101(1–3), 163–186 (2015)
Le Thi, H.A., Pham Dinh, T., Thiao, M.: Efficient approaches for \(\ell _2-\ell _0\) regularization and applications to feature selection in SVM. Appl. Intell. 45(2), 549–565 (2016)
Le Thi, H.A., Phan, D.N., Pham Dinh, T.: DCA based approaches for bi-level variable selection and application for estimate multiple sparse covariance matrices. Neurocomputing 466, 162–177 (2021)
Phan, D.N., Le Thi, H.A.: Group variable selection via \(\ell _{p,0}\) regularization and application to optimal scoring. Neural Netw. 118, 220–234 (2019)
Pham Dinh, T., Huynh, V.N., Le Thi, H.A., Ho, V.T.: Alternating DC algorithm for partial DC programming problems. J. Global Optim. 82(4), 897–928 (2022)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Minimizing compositions of differences-of-convex functions with smooth mappings. Math. Oper. Res. (2023) (Minor revision)
Le Thi, H.A., Belghiti, M.T., Pham Dinh, T.: A new efficient algorithm based on DC programming and DCA for clustering. J. Global Optim. 37(4), 593–608 (2007)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recogn. 47(1), 388–401 (2014)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms. Adv. Data Anal. Classif. 1(2), 85–104 (2007)
Le, H.M., Nguyen, T.B.T., Ta, M.T., Le Thi, H.A.: Image segmentation via feature weighted fuzzy clustering by a DCA based algorithm. In: Advanced Computational Methods for Knowledge Engineering. Studies in Computational Intelligence, vol. 479, pp. 53–63. Springer, Heidelberg (2013)
Le, H.M., Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Block clustering based on difference of convex functions (DC) programming and DC algorithms. Neural Comput. 25(10), 2776–2807 (2013)
Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Optimization based DC programming and DCA for hierarchical clustering. Eur. J. Oper. Res. 183(3), 1067–1085 (2007)
Le Thi, H.A., Le, H.M., Nguyen, V.A.: DCA-like for GMM clsutering with sparse regularization (submitted) (2021)
Nguyen, V.A., Le Thi, H.A., Le, H.M.: A DCA based algorithm for feature selection in model-based clustering. In: Nguyen, N.T., Jearanaitanakij, K., Selamat, A., Trawiński, B., Chittayasothorn, S. (eds.) Intelligent Information and Database Systems, pp. 404–415. Springer, Cham (2020)
Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20(2), 172–188 (2008)
Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A DC programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
Le Thi, H.A., Nguyen, M.C.: Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Disc. 28(5–6), 1336–1365 (2014)
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Efficient nonnegative matrix factorization by DC programming and DCA. Neural Comput. 28(6), 1163–1216 (2016)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Yang, Z., Peltonen, J., Kaski, S.: Majorization-Minimization for Manifold Embedding. In: Lebanon, G., Vishwanathan, S.V.N. (eds.) Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 38, pp. 1088–1097. PMLR, San Diego, California (2015)
Neumann, J., Schnorr, G., Steidl, G.: Combined SVM-based feature selection and classification. Mach. Learn. 61, 129–150 (2005)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference, pp. 82–90. Morgan Kaufmann Publishers Inc., San Francisco (1998)
Le Thi, H.A., Ho, V.T.: DCA for Gaussian kernel support vector machines with feature selection. In: Modelling. Computation and Optimization in Information Systems and Management Sciences, pp. 223–234. Springer, Cham (2022)
Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)
Le Thi, H.A., Nguyen, M.C.: DCA based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249(1), 273–300 (2017)
Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse Fisher linear discriminant analysis. Neural Comput. Appl. 28(9), 2809–2822 (2016)
Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse optimal scoring problem. Neurocomputing 186, 170–181 (2016)
Le Thi, H.A., Nguyen, T.B.T., Le,: H.M.: Sparse signal recovery by difference of convex functions algorithms. In: Intelligent Information and Database Systems. LNCS, vol. 7803, pp. 387–397. Springer, Berlin (2013)
Yang, L., Qian, Y.: A sparse logistic regression framework by difference of convex functions programming. Appl. Intell. 45(2), 241–254 (2016)
Wang, L., Kim, Y., Li, R.: Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Stat. 41(5), 2505–2536 (2013)
Song, Y., Lin, L., Jian, L.: Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model. Commun. Nonlinear Sci. 36, 109–128 (2016)
Wu, Y., Liu, Y.: Variable selection in quantile regression. Stat. Sin. 19, 801–817 (2009)
Gasso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process. 57(12), 4686–4698 (2009)
Nguyen, T.B.T., Le Thi, H.A., Le, H.M., Vo, X.T.: DC approximation approach for \(\ell _0\)-minimization in compressed sensing. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol. 358, pp. 37–48. Springer, Cham (2015)
Esser, E., Lou, Y., Xin, J.: A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J. Imag. Sci. 6(4), 2010–2046 (2013)
Lou, Y., Osher, S., Xin, J.: Computational aspects of constrained l1–l2 minimization for compressive sensing. In: Le Thi, H.A., Pham Dinh, T., Nguyen, N.T. (eds.) Modelling, Computation and Optimization in Information Systems and Management Sciences. Advances in Intelligent Systems and Computing, vol. 359, pp. 169–180. Springer, Cham (2015)
Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2. J. Sci. Comput. 64(1), 178–196 (2015)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37(1), 536–563 (2015)
Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstructions from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans. Signal Process. 45(3), 600–616 (1997)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Candes, E.J., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted-\(l_{1}\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3869–3872 (2008)
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509–1533 (2008)
Zou, H., Hastie, T., Tibshirani, R.J.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Cotter, S.F., Rao, B.D., Engan, K., Kreutz-Delgado, K.: Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Process. 53, 2477–2488 (2005)
Chen, J., Huo, X.: Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Process. 54, 4634–4643 (2006)
Sun, L., Liu, J., Chen, J., Ye, J.: Efficient recovery of jointly sparse vectors. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1812–1820. Curran Associates Inc, Vancouver (2009)
Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Stochastic DCA for minimizing a large sum of DC functions with application to multi-class logistic regression. Neural Netw. 132, 220–231 (2020)
Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Series B Stat. Methodol. 76, 373–397 (2014)
Calandriello, D., Lazaric, A., Restelli, M.: Sparse multi-task reinforcement learning. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 819–827. Curran Associates Inc, Montreal (2014)
Phan, D.N., Le Thi, H.A., Pham Dinh, T.: Sparse covariance matrix estimation by DCA-based algorithms. Neural Comput. 29(11), 3040–3077 (2017)
Vo, X.T., Le Thi, H.A., Pham Dinh, T., Nguyen, T.B.T.: DC programming and DCA for dictionary learning. In: Computational Collective Intelligence vol. 9329, pp. 295–304. Springer, Cham (2015)
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.S.: Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2009)
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Feature selection for linear SVMs under uncertain data: robust optimization based on difference of convex functions algorithms. Neural Netw. 59, 36–50 (2014)
Vo, X.T.: Learning with sparsity and uncertainty by difference of convex functions optimization. Ph.D. thesis, University of Lorraine (2015)
Vo, X.T., Le Thi, H.A., Pham Dinh, T.: Robust optimization for clustering. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, T.-P. (eds.) Intelligent Information and Database Systems, pp. 671–680. Springer, Berlin (2016)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends® Mach. Learn. 4(2), 107–194 (2012)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th on International Conference on Machine Learning, pp. 928–935. AAAI Press, Washington (2003)
Shalev-Shwartz, S., Singer, Y.: A primal-dual perspective of online learning algorithms. Mach. Learn. 69(2–3), 115–142 (2007)
Chung, T.H.: Approximate methods for sequential decision making using expert advice. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory. COLT ’94, pp. 183–189. ACM, New York (1994)
Le Thi, H.A., Ho, V.T.: DCA for online prediction with expert advice. Neural Comput. Appl. 33(15), 9521–9544 (2021)
Le Thi, H.A., Ho, V.T., Pham Dinh, T.: A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning. J. Global Optim. 73(2), 279–310 (2019)
Calafiore, G.C., Gaubert, S., Possieri, C.: A universal approximation result for difference of log-sum-exp neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(12), 5603–5612 (2020)
Brüggemann, S., Possieri, C.: On the use of difference of log-sum-exp neural networks to solve data-driven model predictive control tracking problems. IEEE Control Syst. Lett. 5(4), 1267–1272 (2020)
Sankaranarayanan, P., Rengaswamy, R.: CDiNN-Convex Difference Neural Networks. Preprint at https://arxiv.org/abs/2103.17231 (2021)
Cui, Y., He, Z., Pang, J.-S.: Multicomposite nonconvex optimization for training deep neural networks. SIAM J. Optim. 30(2), 1693–1723 (2020)
Berrada, L., Zisserman, A., Kumar, M.P.: Trusting SVM for piecewise linear CNNs. Preprint at https://arxiv.org/abs/1611.02185 (2016)
Mangasarian, O.L., Fromovitz, S.: The fritz john necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17(1), 37–47 (1967)
Mangasarian, O.L.: Nonlinear Programming. McGraw-Hill, New York (1969)
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is the full paper of the author’s plenary lecture - as the winner of the Constantin Caratheodory prize 2021 - at the World Congress on Global Optimization WCGO 2021.
Appendices
Appendix A Convergence of Standard DCA
Let X \(\subset \mathbb {R}^n\) and Y \(\subset \mathbb {R}^n\) be two nonempty convex sets, \(\rho _{i}\) and \(\rho _{i}^{*},(i=1,2)\) be real nonnegative numbers such that \(0 \le \rho _{i}<\rho (f_{i},X)\) (resp. \(0 \le \rho _{i}^{*}<\rho (f_{i}^{*},Y)\)) where \(\rho _{i}=0\) (resp. \(\rho _{i}^{*}=0\)) if \(\rho (f_{i},X)=0\) (resp. \(\rho (f_{i}^{*},Y)=0\)) and \(\rho _{i}\) (resp. \(\rho _{i}^{*}\)) may take the value \(\rho (f_{i},X)\) (resp. \(\rho (f_{i}^{*},Y)\)) if it is attained. We next set \(f_{1}=g\) and \(f_{2}=h\). Also let \(dx^{k}:=x^{k+1}-x^{k}\) and \(dy^{k}:=y^{k+1}-y^{k}\).
Theorem 11
Let \(X \subset \mathbb {R}^n\) and \(Y \subset \mathbb {R}^n\) be two nonempty convex sets containing the sequences \(\{x^{k}\}\) and \(\{y^{k}\}\) generated by DCA, respectively, and \(dx^{k}:=x^{k+1}-x^{k}, dy^{k}:=y^{k+1}-y^{k}\). The DCA is a descent method without linesearch but with global convergence, which enjoys the following key properties:
1 For the primal DC program \((P_{dc})\)
The decrease of the sequence \(\{(g-h)(x^{k})\}\) is expressed by \((g-h)(x^{k+1}) \le (h^{*}-g^{*})(y^{k})-\dfrac{\rho _{2}}{2}\Vert dx^{k}\Vert ^{2} \le (g-h)(x^{k})-\dfrac{\rho _{1}+\rho _{2}}{2}\Vert dx^{k}\Vert ^{2}, \forall k\) where the equality
is verified if and only if \(x^{k}\in \partial g^{*}(y^{k})\), \(y^{k}\in \partial h(x^{k+1})\) and \((\rho _{1}+\rho _{2})dx^{k}=0.\)
In this case, one obtains the following main statements:
1.1 \(x^{k},x^{k+1}\) are DC-critical points of \(g-h\) satisfying \(y^{k}\in (\partial g(x^{k})\cap \partial h(x^{k}))\) and \(y^{k}\in (\partial g(x^{k+1})\cap \partial h(x^{k+1})),\)
1.2 \(y^{k}\) is a DC critical point of \(h^{*}-g^{*}\) and \([x^{k},x^{k+1}]\subset (\partial g^{*}(y^{k})\cap \partial h^{*}(y^{k})),\)
1.3 If \(\rho _{1}+\rho _{2}>0\), then \(x^{k+1}=x^{k}\), \(y^{k}=y^{k-1}\) if \(\rho _{1}^{*}>0\) and \(y^{k+1}=y^{k}\) if \(\rho _{2}^{*}>0\).
Furthermore, if g or h is strictly convex on X, then \(x^{k+1}=x^{k}\).
In such a case (A1), the DCA terminates at the \(k^{th}\) iteration (finite convergence of DCA).
2 For the dual DC program \((D_{dc})\)
Similarly, the DC duality provides the dual DC program \((D_{dc})\) with
The equality
occurs if and only if \(x^{k+1}\in \partial g^{*}(y^{k+1}),y^{k}\in \partial h(x^{k+1})\) and \((\rho _{1}^{*}+\rho _{2}^{*})dy^{k}=0\).
In this case, the following properties hold:
2.1 The equality \((h^{*}-g^{*})(y^{k+1})=(g-h)(x^{k+1})\) holds and \(y^{k},y^{k+1}\) are the DC critical points of \(h^{*}-g^{*}\) with \(x^{k+1}\in (\partial g^{*}(y^{k})\cap h^{*}(y^{k}))\) and \(x^{k+1}\in (\partial g^{*}(y^{k+1})\cap \partial h^{*}(y^{k+1})),\)
2.2 \(x^{k+1}\) is a DC critical point of \(g-h\) and \([y^{k},y^{k+1}]\subset (\partial g(x^{k+1})\cap \partial h(x^{k+1})),\)
2.3 \(y^{k+1}=y^{k}\) if \(\rho _{1}^{*}+\rho _{2}^{*}>0,x^{k+1}=x^{k}\) if \(\rho _{2}>0\) and \(x^{k+2}=x^{k+1}\) if \(\rho _{1}>0\).
Furthermore, if \(g^{*}\) or \(h^{*}\) is strictly convex on \(\mathbb {R}^n\), then \(y^{k+1}=y^{k}.\)
As for 1.3, in the case (A2), the DCA terminates at the \(k^{th}\) iteration (finite convergence of DCA).
3. If \(\rho _{1}+\rho _{2}>0\) then the primal DC series \(\{\Vert x^{k+1}-x^{k}\Vert ^{2}\}\) converges with its limit bounded above by
Dually, if \(\rho _{1}^{*}+\rho _{2}^{*}>0\), then the dual DC series \(\{\Vert y^{k+1}-y^{k}\Vert ^{2}\})\) is convergent with its limit bounded above by
4. If \(\alpha \) is finite, the sequences \(\{(g-h)(x^{k})\}\), \(\{(h^{*}-g^{*})(y^{k})\}\) decrease and converge to the same limit \(\beta \ge \alpha \): \(\lim _{k\rightarrow +\infty }(g-h)(x^{k})=\lim _{k\rightarrow +\infty }(h^{*}-g^{*})(y^{k})=\beta .\)
5. If \(\alpha \) is finite and the sequences \(\{x^{k}\}\) and \(\{y^{k}\}\) are bounded, then for every limit point \(x^{*}\) of \(\{x^{k}\}\) (resp. \(y^{*}\) of \(\{y^{k}\}\)) there exists a limit point \(y^{*}\) of \(\{y^{k}\}\) (resp. \(x^{*}\) of \(\{x^{k}\}\)) such that \((x^{*},y^{*})\in [\partial g^{*}(y^{*})\cap \partial h^{*}(y^{*})]\times [\partial g(x^{*})\cap \partial h(x^{*})]\) and \((g-h)(x^{*})=(h^{*}-g^{*})(y^{*})=\beta \ge \alpha .\) Such a point \(x^{*}\)(resp. \(y^{*})\) is DC critical point of \(g-h\) (resp. \(h^{*}-g^{*}\)).
6. DCA’s complexity for primal and dual DC programs
Let \(x^{*}\)be a DC critical point of \(g-h\) defined as a limit point of the sequence \(\{x^{k}\}\) computed by the primal DCA. Then, from (A3), one deduces \(f(x^{*})=(g-h)(x^{*})=\beta :=\lim _{k\rightarrow +\infty }f(x^{k})=\lim _{k\rightarrow +\infty }(g-h)(x^{k})\) and \(\frac{\rho _{1}+\rho _{2}}{2}(k+1)\min \{\Vert x^{l+1}-x^{l}\Vert ^{2}:l=0,\ldots ,k\}\le [f(x^{0})-f(x^{*})].\) Moreover, if \(\rho _{1}+\rho _{2}>0\), then \(\min \{\Vert x^{l+1}-x^{l}\Vert :l=0,\ldots ,k\}\le \frac{2^{1/2}[f(x^{0})-f(x^{*})]^{1/2}}{(\rho _{1}+\rho _{2})^{1/2}(k+1)^{1/2}}.\)
Likewise, by using the same reasoning for the sequence \(\{y^{k}\}\) via (A4), we get the similar results for dual DCA: \((h^{*}-g^{*})(y^{*})=\beta :=\lim _{k\rightarrow +\infty }(h^{*}-g^{*})(y^{k})\) and \(\frac{\rho _{1}^{*}+\rho _{2}^{*}}{2}\sum \limits _{k=0}^{k}\Vert y^{l+1}-y^{l}\Vert ^{2} \le (h^{*}-g^{*})(y^{0})-(h^{*}-g^{*})(y^{k+1}) \le (h^{*}-g^{*})(y^{0})-\beta \le (h^{*}-g^{*})(y^{0})-\alpha ,~\forall k.\) Hence, the inequality \(\rho _{1}^{*}+\rho _{2}^{*}>0\) gives
\(\min \{\Vert y^{l+1}-y^{l}\Vert :l=0,\ldots ,k\}\le \frac{2^{1/2}[(h^{*}-g^{*})(y^{0})-(h^{*}-g^{*})(y^{*})]^{1/2}}{(\rho _{1}^{*}+\rho _{2}^{*})^{1/2}(k+1)^{1/2}}.\)
Therefore, both primal and dual DCA have a complexity \(O(1/\sqrt{k})\).
Appendix B Global convergence of GDCA1
Denote by \(I(x):=\left\{ i\in \{1,\ldots ,m\}: ~ f_{i}(x)=p(x)\right\} \). We say that the extended Mangasarian-Fromowitz constraint qualification (EMFCQ) is satisfied at \(x^{*}\in E\) with \(I(x^{*})\not =\emptyset \) if
When \(f_{i}^{\prime }s\) are continuously differentiable, then \(f_{i}^{\uparrow }(x^{*},d)=\langle \nabla f(x^{*}),d\rangle .\) Therefore, (EMFCQ) becomes the well-known Mangasarian-Fromowitz constraint qualification. It is well known that if the (extended) Mangasarian-Fromowitz constraint qualification is satisfied at a local minimizer \(x^{*}\) of problem (6) then the KKT first order necessary condition (7) holds (see [160, 161]). In the global convergence theorem, we make use of the following assumption:
Assumption 3
The (extended) Mangasarian-Fromowitz constraint qualification (EMFCQ) is satisfied at any \(x\in {{\mathbb {R}}}^{n}\) with \(p(x)\ge 0.\)
When \(f_{i}\), \(i=1,\ldots ,m,\) are all convex functions, then it is obvious that this assumption is satisfied under the \(f_{i}(x)<0\) for all \(i=1,\ldots ,m.\)
Theorem 12
Suppose that \(C\subseteq {{\mathbb {R}}}^{n}\) is a nonempty closed convex set and \(f_{i}\), \(i=1,\ldots ,m\) are DC functions on C. Suppose further that Assumptions 1–3 are verified. Let \(\delta >0,\) \(\beta _{1}>0\) be given. Let \(\{x^{k}\}\) be a sequence generated by GDCA1. Then GDCA1 either stops, after finitely many iterations, at a KKT point \(x^{k}\) for problem (6) or generates an infinite sequence \(\{x^{k}\}\) of iterates such that \(\lim _{k\rightarrow \infty }\Vert x^{k+1}-x^{k}\Vert =0\) and every limit point \(x^{\infty }\) of the sequence \(\{x^{k}\}\) is a KKT point of problem (6).
Appendix C Global convergence of GDCA2
Recall, as defined in the preceding section, that \(\varphi _{k}(x):=f_{0}(x)+\beta _{k}p^{+}(x).\) The following lemma is needed to investigate the convergence of GDCA2.
Lemma 13
The sequence \((x^{k},t^{k})\) generated by GDCA2 satisfies the following inequality \(\varphi _{k}(x^{k})-\varphi _{k}(x^{k+1})\ge \frac{\rho }{2}\Vert x^{k+1}-x^{k}\Vert ^{2}\), for all \(k=1,2,\ldots \) where, \(\rho :=\rho (g_{0},C)+\rho (h_{0},C)+\min \{\rho (g_{i},C): ~ i=1,\ldots ,m\}.\)
Theorem 14
Suppose that \(C\subseteq {{\mathbb {R}}}^{n}\) is a nonempty closed convex set and \(f_{i}, i=1,\ldots ,m,\) are DC functions on C such that Assumptions 1 and 3 are verified. Suppose further that for each \(i=0,\ldots ,m,\) either \(g_{i}\) or \(h_{i}\) is differentiable on C and that \(\rho :=\rho (g_{0},C)+\rho (h_{0},C)+\min \{\rho (g_{i},C): i=1,\ldots ,m\}>0.\) Let \(\delta _{1},\delta _{2}>0,\) \(\beta _{1}>0\) be given. Let \(\{x^{_{k}}\}\) be a sequence generated by GDCA2. Then GDCA2 either stops, after finitely many iterations, at a KKT point \(x^{k}\) for problem (6) or generates an infinite sequence \(\{x^{k}\}\) of iterates such that \(\lim _{k\rightarrow \infty }\Vert x^{k+1}-x^{k}\Vert =0\) and every limit point \(x^{\infty }\) of the sequence \(\{x^{k}\}\) is a KKT point of problem (6).
Note that, as shown in Theorems 12 and 14, the penalty parameter \(\beta _{k}\) is constant when k is sufficiently large. Observing from the proof of these convergence theorems, the sequence \(\{\varphi (x^{k})\}\) of values of the function \(\varphi (x)=f_{0}(x)+\beta _{k}p^{+}(x)\) along with the sequence \(\{x^{k}\}\) generated by GDCA1 and GDCA2 is decreasing. These results remain valid if we replace, in (11), the variable t by \(t_{i}\) for \(i=1,\ldots ,m\) and the function \(\beta _{k}t\) by \(\beta _{k}\sum _{i=1}^{m}t_{i}.\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Le Thi, H.A., Pham Dinh, T. Open issues and recent advances in DC programming and DCA. J Glob Optim 88, 533–590 (2024). https://doi.org/10.1007/s10898-023-01272-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-023-01272-1