Abstract
We study a challenging problem in machine learning that is the reduced-rank multitask linear regression with covariance matrix estimation. The objective is to build a linear relationship between multiple output variables and input variables of a multitask learning process, taking into account the general covariance structure for the errors of the regression model in one hand, and reduced-rank regression model in another hand. The problem is formulated as minimizing a nonconvex function in two joint matrix variables (X,Θ) under the low-rank constraint on X and positive definiteness constraint on Θ. It has a double difficulty due to the non-convexity of the objective function as well as the low-rank constraint. We investigate a nonconvex, nonsmooth optimization approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) for this hard problem. A penalty reformulation is considered which takes the form of a partial DC program. An alternating DCA and its inexact version are developed, both algorithms converge to a weak critical point of the considered problem. Numerical experiments are performed on several synthetic and benchmark real multitask linear regression datasets. The numerical results show the performance of the proposed algorithms and their superiority compared with three classical alternating/joint methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aldrin, M.: Reduced-Rank Regression. Encyclopedia of Environmetrics, Vol. 3. Wiley, pp. 1724–1728 (2002)
Chen, L., Huang, J.Z.: Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Am. Stat. Assoc. 107(500), 1533–1545 (2012)
Chen, L., Huang, J.Z.: Sparse reduced-rank regression with covariance estimation. Stat. Comput. 26(1), 461–470 (2016)
Cover, T.M., Thomas, A.: Determinant inequalities via information theory. SIAM J. Matrix Anal. Appl. 9(3), 384–392 (1988)
Dev, H., Sharma, N.L., Dawson, S.N., Neal, D.E., Shah, N.: Detailed analysis of operating time learning curves in robotic prostatectomy by a novice surgeon. BJU Int. 109(7), 1074–1080 (2012)
Dubois, B., Delmas, J.F., Obozinski, G.: Fast algorithms for sparse reduced-rank regression. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, Proceedings of Machine Learning Research, vol. 89, pp 2415–2424. PMLR (2019)
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)
Foygel, R., Horrell, M., Drton, M., Lafferty, J.: Nonparametric reduced rank regression. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp 1628–1636. Curran Associates, Inc. (2012)
Ha, W., Foygel Barber, R.: Alternating minimization and alternating descent over nonconvex sets. ArXiv e-prints arXiv:1709.04451 (2017)
Harrison, L., Penny, W., Friston, K.: Multivariate autoregressive modeling of fMRI time series. Neuroimage 19, 1477–1491 (2003)
He, D., Parida, L., Kuhn, D.: Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction. Bioinformatics 32(12), i37–i43 (2016)
Hu, Z., Nie, F., Wang, R., Li, X.: Low rank regularization: A review. Neural Networks. In Press. Available online 31 October 2020. https://doi.org/10.1016/j.neunet.2020.09.021 (2020)
Hyams, E., Mullins, J., Pierorazio, P., Partin, A., Allaf, M., Matlaga, B.: Impact of robotic technique and surgical volume on the cost of radical prostatectomy. J. Endourol. 27(3), 298–303 (2013)
Ioffe, A., Tihomirov, V.: Theory of extremal problems. North-Holland (1979)
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)
Koshi, S.: Convergence of convex functions and duality. Hokkaido Math. J. 14(3), 399–414 (1985)
Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)
Le Thi, H.A.: Analyse numérique des algorithmes de l’optimisation DC. approches locale et globale. codes et simulations numériques en grande dimension. applications. Ph.D. thesis, University of Rouen France (1994)
Le Thi, H.A.: Solving Large scale molecular distance geometry problems by a smoothing technique via the gaussian transform and D.C. Programming. J. Glob. Optim. 27(1), 375–397 (2003)
Le Thi, H.A.: Portfolio selection under downside risk measures and cardinality constraints based on DC programming and DCA. Comput. Manag. Sci. 6 (4), 459–475 (2009)
Le Thi, H.A.: DC Programming and DCA for supply chain and production management: state-of-the-art models and methods. Int. J. Prod. Res. 58 (20), 6078–6114 (2020)
Le Thi, H.A., Ho, V.T.: Online learning based on online DCA and application to online classification. Neural Comput. 32(4), 759–793 (2020)
Le Thi, H.A., Ho, V.T., Pham Dinh, T.: A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning. J. Glob. Optim. 73(2), 279–310 (2019)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC Programming and DCA for General DC Programs. In: Van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, vol. 282, pp 15–35. Springer International Publishing (2014)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Alternating DC Algorithm for Partial DC Programming. Technical report, University of Lorraine (2016)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recogn. 47 (1), 388–401 (2014)
Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR.org, pp. 3394–3403 (2017)
Le Thi, H.A., Nguyen, M.C.: DCA Based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249(1), 273–300 (2017)
Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A DC programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
Le Thi, H.A., Pham Dinh, T.: Difference of convex functions algorithms (DCA) for image restoration via a Markov random field model. Optim. Eng. 18 (4), 873–906 (2017)
Le Thi, H.A., Pham Dinh, T.: DC Programming and DCA: thirty years of developments. Mathematical programming, Special issue: DC Programming - Theory. Algorithms and Applications 169(1), 5–68 (2018)
Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC Approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Le Thi, H.A., Pham Dinh, T., Ngai, H.V.: Exact penalty and error bounds in dc programming. J. Glob. Optim. 52(3), 509–535 (2012)
Le Thi, H.A., Phan, D.N.: DC Programming and DCA for sparse optimal scoring problem. Neurocomputing 186, 170–181 (2016)
Le Thi, H.A., Ta, A.S., Pham Dinh, T.: An efficient DCA based algorithm for power control in large scale wireless networks. Appl. Math. Comput. 318, 215–226 (2018)
Lee, C.L., Lee, C.A., lee, J.: Handbook of Quantitative Finance and Risk Management. Springer, USA (2010)
Magnus, J.R., Neudecker, H.: Matrix differential calculus with applications to simple, hadamard, and kronecker products. J. Math. Psychol. 29(4), 474–492 (1985)
Nguyen, M.N., Le Thi, H.A., Daniel, G., Nguyen, T.A.: Smoothing techniques and difference of convex functions algorithms for image reconstructions. Optim. 69(7-8), 1601–1633 (2020)
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Pham Dinh, T., Le Thi, H.A.: DC Optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Pham Dinh, T., Le Thi, H.A.: Recent Advances in DC Programming and DCA. In: Nguyen, N.T., Le Thi, H.A. (eds.) Transactions on Computational Intelligence XIII, vol. 8342, pp 1–37. Springer, Berlin (2014)
Phan, D.N., Le Thi, H.A.: Group variable selection via ℓp,0 regularization and application to optimal scoring. Neural Netw. 118, 220–234 (2019)
Reinsel, G.C., Velu, R.P.: Multivariate Reduced-Rank regression: Theory and Applications, 1 edn. Lecture Notes in Statistics 136. Springer, New York (1998)
Salinetti, G., Wets, R.J.: On the relations between two types of convergence for convex functions. J. Math. Anal. Appl. 60(1), 211–226 (1977)
Smith, A.E., Coit, D.W.: Constraint-handling techniques - penalty functions. In: Handbook of Evolutionary Computation, Oxford University Press, pp. C5.2:1–C5.2.6 (1997)
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Tran, T.T., Le Thi, H.A., Pham Dinh, T.: DC Programming and DCA for enhancing physical layer security via cooperative jamming. Comput. Oper. Res. 87, 235–244 (2017)
Wold, S., Sjöström, M., Eriksson, L.: PLS-Regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)
Yuan, M., Ekici, A., Lu, Z., Monteiro, R.: Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(3), 329–346 (2007)
Zălinescu, C.: Convex analysis in general vector spaces. World Scientific (2002)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: estimate μ Θ
The next lemma indicates how to estimate the value of μΘ to guarantee the convexity of function \(\widetilde {H}(\cdot ,\boldsymbol {\varTheta })\) defined in Theorem 2. In this lemma, ∥⋅∥2 denotes the ℓ2-norm (or spectral norm) of a matrix.
Lemma 1
For each fixed \(\boldsymbol {\varTheta } \in \mathbb {R}^{m \times m}\), if \(\mu _{\boldsymbol {\varTheta }} \leq 1/\left (\|\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top }\|_{2} \|\boldsymbol {\varTheta }\|_{2} + 2\alpha \right )\), then \(\widetilde {H}(\cdot ,\boldsymbol {\varTheta })\) is convex.
Proof
Knowing that the function \(\max \limits _{\boldsymbol {Y} \in \mathcal {X}} (2\langle \boldsymbol {X}, \boldsymbol {Y} \rangle - \|\boldsymbol {Y}\|_{F}^{2})\) is convex in X and the sum of two convex functions is also convex, then it is sufficient to choose μΘ such that the function \((\frac {1}{2\mu _{\boldsymbol {\varTheta }}}-\alpha ) \|\boldsymbol {X}\|_{F}^{2} - \frac {1}{n} {\sum }_{i=1}^{n} (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})^{\top } \boldsymbol {\varTheta } (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})\) becomes convex. For this aim, we can take \((\frac {1}{\mu _{\boldsymbol {\varTheta }}}-2\alpha )\) greater than or equal to the spectral radius of the Hessian matrix of \({\varLambda }(\boldsymbol {X}) = \frac {1}{n} {\sum }_{i=1}^{n} (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})^{\top } \boldsymbol {\varTheta } (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})\), i.e. \(\frac {1}{\mu _{\boldsymbol {\varTheta }}}-2\alpha \geq \rho (\nabla ^{2} {\varLambda }(\boldsymbol {X}))\). From the matrix differential calculus (see, e.g., [38]), we have
Here ⊗ denotes the Kronecker product. Since ∇2Λ(X) is symmetric, we yield \(\rho (\nabla ^{2} {\varLambda }(\boldsymbol {X})) = \|\nabla ^{2} {\varLambda }(\boldsymbol {X})\|_{2} = \|\left (\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top } \right ) \otimes \boldsymbol {\varTheta }\|_{2} = \|\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top }\|_{2} \|\boldsymbol {\varTheta }\|_{2}\). Thus, the proof is complete. □
Appendix B: Comparative Algorithms for Solving the Problem (2)
The Al-M method alternates between computing the variable X and Θ at each iteration. In particular, at iteration k, for fixed Θ, we need to compute Xk+ 1, an optimal solution to the following problem (see, e.g., [1, 44])
Let us denote by Z (resp. Φ) a matrix in \(\mathbb {R}^{m \times n}\) (resp. \(\mathbb {R}^{d \times n}\)) whose each column is a vector zi (resp. ϕi); and define Dk := (ΦΦ⊤)(− 1/2)(ΦZ⊤)(Θk)(1/2). A reduced-rank regression estimate Xk+ 1 of (35) is given by (see [1])
where the sequence {Λt} is the singular values of matrix Dk; {ut} and {vt} are the left-hand and right-hand singular vectors of Dk. For fixed X, the Al-M computes the point Θk+ 1 using (17) at Xk+ 1. Note that the Al-M method does not have any parameter.

The Al-GD method differs from the Al-M method by the fact that the Al-GD performs one iteration of gradient descent method for solving the problem (35) (see [9]). In particular, Xk+ 1 is computed as follows:
where the step size ηX is a tuning parameter. Al-GD computes the point Θk+ 1 using (17).

The J-GD method does not compute two variables alternately, but takes one gradient descent step in the joint variable (X,Θ) (see [9]). For estimating Xk+ 1, it is the same as (4), while the point Θk+ 1 is computed by using gradient descent method for (16) at the point (Xk, Θk) as follows:
where the step size ηΘ is a tuning parameter and

Rights and permissions
About this article
Cite this article
Le Thi, H.A., Ho, V.T. Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation. Ann Math Artif Intell 90, 809–829 (2022). https://doi.org/10.1007/s10472-021-09732-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-021-09732-8
Keywords
- Reduced-rank multitask linear regression
- Covariance matrix estimation
- DC programming
- DCA
- Partial DC program
- Alternating DCA