Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation

Le Thi, Hoai An; Ho, Vinh Thanh

doi:10.1007/s10472-021-09732-8

Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation

Published: 20 March 2021

Volume 90, pages 809–829, (2022)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

189 Accesses
Explore all metrics

Abstract

We study a challenging problem in machine learning that is the reduced-rank multitask linear regression with covariance matrix estimation. The objective is to build a linear relationship between multiple output variables and input variables of a multitask learning process, taking into account the general covariance structure for the errors of the regression model in one hand, and reduced-rank regression model in another hand. The problem is formulated as minimizing a nonconvex function in two joint matrix variables (X,Θ) under the low-rank constraint on X and positive definiteness constraint on Θ. It has a double difficulty due to the non-convexity of the objective function as well as the low-rank constraint. We investigate a nonconvex, nonsmooth optimization approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) for this hard problem. A penalty reformulation is considered which takes the form of a partial DC program. An alternating DCA and its inexact version are developed, both algorithms converge to a weak critical point of the considered problem. Numerical experiments are performed on several synthetic and benchmark real multitask linear regression datasets. The numerical results show the performance of the proposed algorithms and their superiority compared with three classical alternating/joint methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Alternating DCA-Based Approach for Reduced-Rank Multitask Linear Regression with Covariance Estimation

Low-Rank Matrix Recovery Via Nonconvex Optimization Methods with Application to Errors-in-Variables Matrix Regression

Article 05 April 2025

An Alternating Direction Method with Continuation for Nonconvex Low Rank Minimization

Article 21 May 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aldrin, M.: Reduced-Rank Regression. Encyclopedia of Environmetrics, Vol. 3. Wiley, pp. 1724–1728 (2002)
Chen, L., Huang, J.Z.: Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Am. Stat. Assoc. 107(500), 1533–1545 (2012)
Article MathSciNet Google Scholar
Chen, L., Huang, J.Z.: Sparse reduced-rank regression with covariance estimation. Stat. Comput. 26(1), 461–470 (2016)
Article MathSciNet Google Scholar
Cover, T.M., Thomas, A.: Determinant inequalities via information theory. SIAM J. Matrix Anal. Appl. 9(3), 384–392 (1988)
Article MathSciNet Google Scholar
Dev, H., Sharma, N.L., Dawson, S.N., Neal, D.E., Shah, N.: Detailed analysis of operating time learning curves in robotic prostatectomy by a novice surgeon. BJU Int. 109(7), 1074–1080 (2012)
Article Google Scholar
Dubois, B., Delmas, J.F., Obozinski, G.: Fast algorithms for sparse reduced-rank regression. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, Proceedings of Machine Learning Research, vol. 89, pp 2415–2424. PMLR (2019)
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)
Article Google Scholar
Foygel, R., Horrell, M., Drton, M., Lafferty, J.: Nonparametric reduced rank regression. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp 1628–1636. Curran Associates, Inc. (2012)
Ha, W., Foygel Barber, R.: Alternating minimization and alternating descent over nonconvex sets. ArXiv e-prints arXiv:1709.04451 (2017)
Harrison, L., Penny, W., Friston, K.: Multivariate autoregressive modeling of fMRI time series. Neuroimage 19, 1477–1491 (2003)
Article Google Scholar
He, D., Parida, L., Kuhn, D.: Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction. Bioinformatics 32(12), i37–i43 (2016)
Article Google Scholar
Hu, Z., Nie, F., Wang, R., Li, X.: Low rank regularization: A review. Neural Networks. In Press. Available online 31 October 2020. https://doi.org/10.1016/j.neunet.2020.09.021 (2020)
Hyams, E., Mullins, J., Pierorazio, P., Partin, A., Allaf, M., Matlaga, B.: Impact of robotic technique and surgical volume on the cost of radical prostatectomy. J. Endourol. 27(3), 298–303 (2013)
Article Google Scholar
Ioffe, A., Tihomirov, V.: Theory of extremal problems. North-Holland (1979)
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)
Article MathSciNet Google Scholar
Koshi, S.: Convergence of convex functions and duality. Hokkaido Math. J. 14(3), 399–414 (1985)
Article MathSciNet Google Scholar
Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)
Article Google Scholar
Le Thi, H.A.: Analyse numérique des algorithmes de l’optimisation DC. approches locale et globale. codes et simulations numériques en grande dimension. applications. Ph.D. thesis, University of Rouen France (1994)
Le Thi, H.A.: Solving Large scale molecular distance geometry problems by a smoothing technique via the gaussian transform and D.C. Programming. J. Glob. Optim. 27(1), 375–397 (2003)
MathSciNet MATH Google Scholar
Le Thi, H.A.: Portfolio selection under downside risk measures and cardinality constraints based on DC programming and DCA. Comput. Manag. Sci. 6 (4), 459–475 (2009)
Article MathSciNet Google Scholar
Le Thi, H.A.: DC Programming and DCA for supply chain and production management: state-of-the-art models and methods. Int. J. Prod. Res. 58 (20), 6078–6114 (2020)
Article Google Scholar
Le Thi, H.A., Ho, V.T.: Online learning based on online DCA and application to online classification. Neural Comput. 32(4), 759–793 (2020)
Article MathSciNet Google Scholar
Le Thi, H.A., Ho, V.T., Pham Dinh, T.: A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning. J. Glob. Optim. 73(2), 279–310 (2019)
Article MathSciNet Google Scholar
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC Programming and DCA for General DC Programs. In: Van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, vol. 282, pp 15–35. Springer International Publishing (2014)
Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Alternating DC Algorithm for Partial DC Programming. Technical report, University of Lorraine (2016)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recogn. 47 (1), 388–401 (2014)
Article Google Scholar
Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR.org, pp. 3394–3403 (2017)
Le Thi, H.A., Nguyen, M.C.: DCA Based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249(1), 273–300 (2017)
Article MathSciNet Google Scholar
Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A DC programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
Article MathSciNet Google Scholar
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
MathSciNet MATH Google Scholar
Le Thi, H.A., Pham Dinh, T.: Difference of convex functions algorithms (DCA) for image restoration via a Markov random field model. Optim. Eng. 18 (4), 873–906 (2017)
Article MathSciNet Google Scholar
Le Thi, H.A., Pham Dinh, T.: DC Programming and DCA: thirty years of developments. Mathematical programming, Special issue: DC Programming - Theory. Algorithms and Applications 169(1), 5–68 (2018)
MATH Google Scholar
Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC Approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Article MathSciNet Google Scholar
Le Thi, H.A., Pham Dinh, T., Ngai, H.V.: Exact penalty and error bounds in dc programming. J. Glob. Optim. 52(3), 509–535 (2012)
Article MathSciNet Google Scholar
Le Thi, H.A., Phan, D.N.: DC Programming and DCA for sparse optimal scoring problem. Neurocomputing 186, 170–181 (2016)
Article Google Scholar
Le Thi, H.A., Ta, A.S., Pham Dinh, T.: An efficient DCA based algorithm for power control in large scale wireless networks. Appl. Math. Comput. 318, 215–226 (2018)
MathSciNet MATH Google Scholar
Lee, C.L., Lee, C.A., lee, J.: Handbook of Quantitative Finance and Risk Management. Springer, USA (2010)
Book Google Scholar
Magnus, J.R., Neudecker, H.: Matrix differential calculus with applications to simple, hadamard, and kronecker products. J. Math. Psychol. 29(4), 474–492 (1985)
Article MathSciNet Google Scholar
Nguyen, M.N., Le Thi, H.A., Daniel, G., Nguyen, T.A.: Smoothing techniques and difference of convex functions algorithms for image reconstructions. Optim. 69(7-8), 1601–1633 (2020)
Article MathSciNet Google Scholar
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
MathSciNet MATH Google Scholar
Pham Dinh, T., Le Thi, H.A.: DC Optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Article MathSciNet Google Scholar
Pham Dinh, T., Le Thi, H.A.: Recent Advances in DC Programming and DCA. In: Nguyen, N.T., Le Thi, H.A. (eds.) Transactions on Computational Intelligence XIII, vol. 8342, pp 1–37. Springer, Berlin (2014)
Phan, D.N., Le Thi, H.A.: Group variable selection via ℓ_p,0 regularization and application to optimal scoring. Neural Netw. 118, 220–234 (2019)
Article Google Scholar
Reinsel, G.C., Velu, R.P.: Multivariate Reduced-Rank regression: Theory and Applications, 1 edn. Lecture Notes in Statistics 136. Springer, New York (1998)
Book Google Scholar
Salinetti, G., Wets, R.J.: On the relations between two types of convergence for convex functions. J. Math. Anal. Appl. 60(1), 211–226 (1977)
Article MathSciNet Google Scholar
Smith, A.E., Coit, D.W.: Constraint-handling techniques - penalty functions. In: Handbook of Evolutionary Computation, Oxford University Press, pp. C5.2:1–C5.2.6 (1997)
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Article MathSciNet Google Scholar
Tran, T.T., Le Thi, H.A., Pham Dinh, T.: DC Programming and DCA for enhancing physical layer security via cooperative jamming. Comput. Oper. Res. 87, 235–244 (2017)
Article MathSciNet Google Scholar
Wold, S., Sjöström, M., Eriksson, L.: PLS-Regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)
Article Google Scholar
Yuan, M., Ekici, A., Lu, Z., Monteiro, R.: Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69(3), 329–346 (2007)
Article MathSciNet Google Scholar
Zălinescu, C.: Convex analysis in general vector spaces. World Scientific (2002)

Download references

Author information

Authors and Affiliations

Université de Lorraine, LGIPM, Le Département IA, F-57000, Metz, France
Hoai An Le Thi & Vinh Thanh Ho

Authors

Hoai An Le Thi
View author publications
You can also search for this author inPubMed Google Scholar
Vinh Thanh Ho
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hoai An Le Thi.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: estimate μ _Θ

The next lemma indicates how to estimate the value of μ_Θ to guarantee the convexity of function $\widetilde {H}(\cdot ,\boldsymbol {\varTheta })$ defined in Theorem 2. In this lemma, ∥⋅∥₂ denotes the ℓ₂-norm (or spectral norm) of a matrix.

Lemma 1

For each fixed $\boldsymbol {\varTheta } \in \mathbb {R}^{m \times m}$, if $\mu _{\boldsymbol {\varTheta }} \leq 1/\left (\|\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top }\|_{2} \|\boldsymbol {\varTheta }\|_{2} + 2\alpha \right )$, then $\widetilde {H}(\cdot ,\boldsymbol {\varTheta })$ is convex.

Proof

Knowing that the function $\max \limits _{\boldsymbol {Y} \in \mathcal {X}} (2\langle \boldsymbol {X}, \boldsymbol {Y} \rangle - \|\boldsymbol {Y}\|_{F}^{2})$ is convex in X and the sum of two convex functions is also convex, then it is sufficient to choose μ_Θ such that the function $(\frac {1}{2\mu _{\boldsymbol {\varTheta }}}-\alpha ) \|\boldsymbol {X}\|_{F}^{2} - \frac {1}{n} {\sum }_{i=1}^{n} (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})^{\top } \boldsymbol {\varTheta } (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})$ becomes convex. For this aim, we can take $(\frac {1}{\mu _{\boldsymbol {\varTheta }}}-2\alpha )$ greater than or equal to the spectral radius of the Hessian matrix of ${\varLambda }(\boldsymbol {X}) = \frac {1}{n} {\sum }_{i=1}^{n} (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})^{\top } \boldsymbol {\varTheta } (\boldsymbol {z}_{i}-\boldsymbol {X} \boldsymbol {\phi }_{i})$, i.e. $\frac {1}{\mu _{\boldsymbol {\varTheta }}}-2\alpha \geq \rho (\nabla ^{2} {\varLambda }(\boldsymbol {X}))$. From the matrix differential calculus (see, e.g., [38]), we have

$$ \begin{array}{@{}rcl@{}} \nabla {\varLambda}(\boldsymbol{X}) &=& \frac{2}{n} \sum\limits_{i=1}^{n} \boldsymbol{\varTheta}(\boldsymbol{X}\boldsymbol{\phi}_{i}-\boldsymbol{z}_{i})\boldsymbol{\phi}_{i}^{\top} = \boldsymbol{\varTheta} \boldsymbol{X} \left( \frac{2}{n} \sum\limits_{i=1}^{n}\boldsymbol{\phi}_{i}\boldsymbol{\phi}_{i}^{\top}\right) - \frac{2}{n} \boldsymbol{\varTheta} \sum\limits_{i=1}^{n} \boldsymbol{z}_{i}\boldsymbol{\phi}_{i}^{\top}; \end{array} $$

(34)

$$ \begin{array}{@{}rcl@{}} \nabla^{2} {\varLambda}(\boldsymbol{X}) &=& \left( \frac{2}{n} \sum\limits_{i=1}^{n}\boldsymbol{\phi}_{i} \boldsymbol{\phi}_{i}^{\top} \right) \otimes \boldsymbol{\varTheta}. \end{array} $$

Here ⊗ denotes the Kronecker product. Since ∇²Λ(X) is symmetric, we yield $\rho (\nabla ^{2} {\varLambda }(\boldsymbol {X})) = \|\nabla ^{2} {\varLambda }(\boldsymbol {X})\|_{2} = \|\left (\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top } \right ) \otimes \boldsymbol {\varTheta }\|_{2} = \|\frac {2}{n} {\sum }_{i=1}^{n}\boldsymbol {\phi }_{i} \boldsymbol {\phi }_{i}^{\top }\|_{2} \|\boldsymbol {\varTheta }\|_{2}$. Thus, the proof is complete. □

Appendix B: Comparative Algorithms for Solving the Problem (2)

The Al-M method alternates between computing the variable X and Θ at each iteration. In particular, at iteration k, for fixed Θ, we need to compute X^k+ 1, an optimal solution to the following problem (see, e.g., [1, 44])

$$ \min \frac{1}{n} \sum\limits_{i=1}^{n} (\boldsymbol{z}_{i}-\boldsymbol{X} \boldsymbol{\phi}_{i})^{\top} \boldsymbol{\varTheta}^{k} (\boldsymbol{z}_{i}-\boldsymbol{X} \boldsymbol{\phi}_{i}) \text{ s.t. } \boldsymbol{X} \in \mathcal{X}. $$

(35)

Let us denote by Z (resp. Φ) a matrix in $\mathbb {R}^{m \times n}$ (resp. $\mathbb {R}^{d \times n}$) whose each column is a vector z_i (resp. ϕ_i); and define D^k := (ΦΦ^⊤)^(− 1/2)(ΦZ^⊤)(Θ^k)^(1/2). A reduced-rank regression estimate X^k+ 1 of (35) is given by (see [1])

$$ \boldsymbol{X}^{k+1} = \sum\limits_{t=1}^{r} {\varLambda}_{t} \left[(1/n) \boldsymbol{\Phi} \boldsymbol{\Phi}^{\top} \right]^{(-1/2)} \boldsymbol{u}_{t} \boldsymbol{v}_{t}^{\top} (\boldsymbol{\varTheta}^{k})^{(-1/2)}, $$

(36)

where the sequence {Λ_t} is the singular values of matrix D^k; {u_t} and {v_t} are the left-hand and right-hand singular vectors of D^k. For fixed X, the Al-M computes the point Θ^k+ 1 using (17) at X^k+ 1. Note that the Al-M method does not have any parameter.

The Al-GD method differs from the Al-M method by the fact that the Al-GD performs one iteration of gradient descent method for solving the problem (35) (see [9]). In particular, X^k+ 1 is computed as follows:

$$ \boldsymbol{X}^{k+1} = \text{Proj}_{\mathcal{X}}\left( \boldsymbol{X}^{k} + \frac{2\eta_{\boldsymbol{X}}}{n}\boldsymbol{\varTheta}^{k} \sum\limits_{i=1}^{n} (\boldsymbol{z}_{i}-\boldsymbol{X}^{k} \boldsymbol{\phi}_{i})\boldsymbol{\phi}_{i}^{\top} \right), $$

(35)

where the step size η_X is a tuning parameter. Al-GD computes the point Θ^k+ 1 using (17).

The J-GD method does not compute two variables alternately, but takes one gradient descent step in the joint variable (X,Θ) (see [9]). For estimating X^k+ 1, it is the same as (4), while the point Θ^k+ 1 is computed by using gradient descent method for (16) at the point (X^k, Θ^k) as follows:

$$ \boldsymbol{\varTheta}^{k+1} = \text{Proj}_{{\varOmega}}\left( \boldsymbol{\varTheta}^{k} + \eta_{\boldsymbol{\varTheta}} \boldsymbol{\Delta}^{k}\right), $$

(38)

where the step size η_Θ is a tuning parameter and

$$ \boldsymbol{\varDelta}^{k} = (\boldsymbol{\varTheta}^{k})^{(-1)} -\left[\frac{1}{n}\boldsymbol{\varTheta}^{k} \sum\limits_{i=1}^{n} (\boldsymbol{z}_{i}-\boldsymbol{X}^{k} \boldsymbol{\phi}_{i})(\boldsymbol{z}_{i}-\boldsymbol{X}^{k} \boldsymbol{\phi}_{i})^{\top} \right]. $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Thi, H.A., Ho, V.T. Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation. Ann Math Artif Intell 90, 809–829 (2022). https://doi.org/10.1007/s10472-021-09732-8

Download citation

Accepted: 16 February 2021
Published: 20 March 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10472-021-09732-8

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Alternating DCA-Based Approach for Reduced-Rank Multitask Linear Regression with Covariance Estimation

Low-Rank Matrix Recovery Via Nonconvex Optimization Methods with Application to Errors-in-Variables Matrix Regression

An Alternating Direction Method with Continuation for Nonconvex Low Rank Minimization

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix A: estimate μ Θ

Lemma 1

Proof

Appendix B: Comparative Algorithms for Solving the Problem (2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now

Appendix A: estimate μ _Θ