Sparse precision matrix estimation with missing observations

Zhang, Ning; Yang, Jin

doi:10.1007/s00180-022-01265-w

Sparse precision matrix estimation with missing observations

Original paper
Published: 26 July 2022

Volume 38, pages 1337–1355, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ning Zhang¹ &
Jin Yang¹

370 Accesses
Explore all metrics

Abstract

Sparse Gaussian graphical models have been extensively applied to detect the conditional independence structures from fully observed data. However, datasets with missing observations are quite common in many practical fields. In this paper, we propose a robust Gaussian graphical model with the covariance matrix being estimated from the partially observed data. We prove that the inverse of the Karush–Kuhn–Tucker mapping associated with the proposed model satisfies the calmness condition automatically. We also apply a linearly convergent alternating direction method of multipliers to find the solution to the proposed model. The numerical performance is evaluated on both the synthetic data and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Estimation of the Precision Matrix with Monotone Missing Data

Article 17 September 2020

Robust and sparse regression in generalized linear model by stochastic optimization

Article 11 June 2019

Sparse Information Filter for Fast Gaussian Process Regression

Notes

Available at: https://www.csie.ntu.edu.tw/~cjlin/libsvm .
Available at: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/.
Available at: https://ana.cachopo.org/datasets-for-single-label-text-categorization.

References

Ahn M, Pang J-S, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J Optim 27(3):1637–1665
Article MathSciNet MATH Google Scholar
Cardoso-Cachopo A (2007) Improving methods for single-label text categorization. PhD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Clarke FH, Ledyaev YS, Stern RJ, Wolenski PR (1998) Nonsmooth analysis and control theory, vol 178. Springer, New York
MATH Google Scholar
Dempster AP (1972) Covariance selection. Biometrics 28(1):157–175
Article MathSciNet Google Scholar
Dontchev A, Rockafellar R (2004) Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal 12(1–2):79–109
Article MathSciNet MATH Google Scholar
Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Methods Instrum Comput 23(2):229–236
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet MATH Google Scholar
Fan J, Liao Y, Liu H (2016) An overview of the estimation of large covariance and precision matrices. Economet J 19(1):C1–C32
Article MathSciNet MATH Google Scholar
Fan R, Jang B, Sun Y, Zhou S (2019) Precision matrix estimation with noisy and missing data. In: The 22nd international conference on artificial intelligence and statistics, vol 89, pp 2810–2819. PMLR
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article MATH Google Scholar
Guo J, Levina E, Michailidis G, Zhu J (2011) Joint estimation of multiple graphical models. Biometrika 98(1):1–15
Article MathSciNet MATH Google Scholar
Han D, Sun D, Zhang L (2018) Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Math Oper Res 43(2):622–637
Article MathSciNet MATH Google Scholar
Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar P (2014) QUIC: quadratic approximation for sparse inverse covariance estimation. J Mach Learn Res 15:2911–2947
MathSciNet MATH Google Scholar
Kolar M, Xing EP (2012) Estimating sparse precision matrices from data with missing values. In: Proceedings of the 29th international conference on machine learning, Edinburgh, Scotland, UK, pp 635–642
Loh P-L, Wainwright MJ (2015) Regularized m-estimators with nonconvexity: statistical and algorithmic theory for local optima. J Mach Learn Res 16(1):559–616
MathSciNet MATH Google Scholar
Lounici K (2014) High-dimensional covariance matrix estimation with missing observations. Bernoulli 20(3):1029–1058
Article MathSciNet MATH Google Scholar
Lu L, Toh KC (2010) An inexact interior point method for L1-regularized sparse covariance selection. Math Program Comput 2(3–4):291–315
MathSciNet MATH Google Scholar
Moreau J-J (1965) Proximité et dualité dans un espace Hilbertien. Bull Soc Math France 93(2):273–299
Article MathSciNet MATH Google Scholar
Park S, Lim J (2019) Non-asymptotic rate for high-dimensional covariance estimation with non-independent missing observations. Stat Probab Lett 153:113–123
Article MathSciNet MATH Google Scholar
Park S, Wang X, Lim J (2020) Estimating high-dimensional covariance and precision matrices under general missing dependence. arXiv preprint arXiv:2006.04632
Pavez E, Ortega A (2021) Covariance matrix estimation with non uniform and data dependent missing observations. IEEE Trans Inf Theory 67(2):1201–1215
Article MathSciNet MATH Google Scholar
Rockafellar TR, Wets RJ-B (1998) Variational analysis. Sobolev BV Sp MPS-SIAM Ser Optim 30:324–326
MATH Google Scholar
Rothman AJ, Bickel PJ, Levina E, Zhu J (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515
Article MathSciNet MATH Google Scholar
Seaman SR, White IR (2013) Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res 22(3):278–295
Article MathSciNet Google Scholar
Städler N, Bühlmann P (2012) Missing values: sparse inverse covariance estimation and an extension to sparse regression. Stat Comput 22(1):219–235
Article MathSciNet MATH Google Scholar
Sun DF (2006) The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math Oper Res 31(4):761–776
Article MathSciNet MATH Google Scholar
Sun DF, Qi L (2001) Solving variational inequality problems via smoothing-nonsmooth reformulations. J Comput Appl Math 129(1–2):37–62
Article MathSciNet MATH Google Scholar
Sun S, Huang R, Gao Y (2012) Network-scale traffic modeling and forecasting with graphical lasso and neural networks. J Transp Eng 138(11):1358–1367
Article Google Scholar
Tang P, Wang C, Sun D, Toh K-C (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J Mach Learn Res 21(226):1–38
MathSciNet MATH Google Scholar
Wang C, Sun D, Toh K-C (2010) Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm. SIAM J Optim 20(6):2994–3013
Article MathSciNet MATH Google Scholar
Wang T, Ren Z, Ding Y, Fang Z, Sun Z, MacDonald ML, Sweet RA, Wang J, Chen W (2016) Fastggm: an efficient algorithm for the inference of gaussian graphical model in biological networks. PLoS Comput Biol 12(2):e1004755
Article Google Scholar
Ye JJ, Ye XY (1997) Necessary optimality conditions for optimization problems with variational inequality constraints. Math Oper Res 22(4):977–997
Article MathSciNet MATH Google Scholar
Yosida K (1964) Functional analysis. Springer, Berlin
MATH Google Scholar
Yu Y-L (2013) On decomposing the proximal map. In: Proceedings of advances in neural information processing systems, pp 91–99
Yuan X (2012) Alternating direction method for covariance selection models. J Sci Comput 51(2):261–273
Article MathSciNet MATH Google Scholar
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35
Article MathSciNet MATH Google Scholar
Yuan X, Zeng S, Zhang J (2020) Discerning the linear convergence of admm for structured convex optimization through the lens of variational analysis. J Mach Learn Res 21:1–75
MathSciNet MATH Google Scholar
Zerenner T, Friederichs P, Lehnertz K, Hense A (2014) A gaussian graphical model approach to climate networks. Chaos 24(2):023103
Article MathSciNet MATH Google Scholar
Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet MATH Google Scholar
Zhang A, Fang J, Liang F, Calhoun VD, Wang Y-P (2018) Aberrant brain connectivity in schizophrenia detected via a fast gaussian graphical model. IEEE J Biomed Health Inf 23(4):1479–1489
Article Google Scholar
Zhang Y, Zhang N, Sun D, Toh KC (2020) A proximal point dual newton algorithm for solving group graphical lasso problems. SIAM J Optim 30(3):2197–2220
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Dongguan University of Technology, Dongguan, 523808, China
Ning Zhang & Jin Yang

Authors

Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research of this author was supported by the National Natural Science Foundation of China (11901083, 12171153), Guangdong Basic and Applied Basic Research Foundation (2022A1515010088).

Appendix

1.1 Proof of Lemma 2.1

Lemma 6.1

(Yu 2013, Theorem 1) Let f and g be two closed convex proper functions. A sufficient condition for $\mathrm{Prox}_{f+g}=\mathrm{Prox}_f\circ \mathrm{Prox}_g$ is

$$\begin{aligned} \partial g(x) \subseteq \partial g\left( \mathrm{Prox}_f(x)\right) ,\,\,\forall \,x \end{aligned}$$

Let $p:{\mathbb {S}}^n\rightarrow {\mathbb {R}}\cup \{+\infty \}$ be defined by

$$\begin{aligned} p(x)=\lambda \Vert x\Vert _{1,\mathrm{off}}+\varphi (x),\,\,\varphi (x):={\mathbb {I}}_{{\mathcal {C}}}(x), \,\,\, {\mathcal {C}}:=\left\{ x:\,\Vert x\Vert \le \alpha \right\} . \end{aligned}$$

(6.1)

It follows from the definition of proximal mapping that

$$\begin{aligned} \mathrm{Prox}_{\varphi }(x)=\Pi _{{\mathcal {C}}}(x)=\left\{ \begin{array}{ll} x,&{}\Vert x\Vert \le \alpha ,\\ \displaystyle \frac{\alpha x}{\Vert x\Vert },&{}\Vert x\Vert >\alpha . \end{array} \right. \end{aligned}$$

(6.2)

Lemma 6.2

Let function p be defined by (6.1), it holds that

$$\begin{aligned} \mathrm{Prox}_{p}= \Pi _{{\mathcal {C}}}\circ \mathrm{Prox}_{\lambda \Vert \cdot \Vert _{1,\mathrm{off}}}. \end{aligned}$$

Proof

From Lemma 6.1, it is sufficient to show that

$$\begin{aligned} \partial \mathrm{Prox}_{\lambda \Vert \cdot \Vert _{1,\mathrm{off}}}(x) \subseteq \partial \mathrm{Prox}_{\lambda \Vert \cdot \Vert _{1,\mathrm{off}}}(y),\,\,y=\Pi _{{\mathcal {C}}}(x),\,\,\forall x\in {\mathbb {R}}^m. \end{aligned}$$

(6.3)

From (6.2), it is sufficient to consider the following cases:

(a)
If $\Vert x\Vert \le \alpha $, then $y=x$. Therefore, the relationship (6.3) holds.
(b)
If $\Vert x\Vert >\alpha $, then $\displaystyle y={\alpha x}/{\Vert x\Vert }$, which means $\mathrm{sgn}(y)=\mathrm{sgn}(x)$. Therefore, the relationship (6.3) also holds in this case.

The proof is completed. □

1.2 Proof of Proposition 2.1

Lemma 6.3

(Zhang et al. 2020, Lemma 3.1) Let $f(x):=-\mathop {\mathrm {log\,det}}\limits \,x$. Then all ${\mathcal {G}}_f\in \partial \mathrm{Prox}_f(Z)$ are self-adjoint and positive definite with $\lambda _{\max }({{\mathcal {G}}}_f)<1$.

Lemma 6.4

Let $x\in {\mathbb {S}}^n$ and ${\mathcal {B}}:{\mathbb {S}}^n\rightarrow {\mathbb {S}}^n$ be any self-adjoint positive definite operator, p is the function defined in Lemma2.1. Then, for any chosen ${\mathcal {G}}_{p}\in \partial \mathrm{Prox}_{p}(x)$, the linear operator $I-{\mathcal {G}}_{p}+{\mathcal {G}}_{p}{\mathcal {B}}$ is nonsingular.

Proof

It follows Lemma 6.1 that $\mathrm{Prox}_p$ is the projection onto the closed convex set ${\mathcal {C}}$. Therefore, we know from Sun and Qi (2001, Theorem 2.3) that any element ${\mathcal {G}}_p\in \partial \mathrm{Prox}_{p}(x)$ is self-adjoint, positive definite, and $\lambda _{\max }({\mathcal {G}}_p)\in [0,1]$. The proof can be completed by Zhang et al. (2020, Lemma 3.2). □

Lemma 6.5

Let ${\mathcal {K}}_{pert}$ be the KKT mapping defined by (2.3), $({\bar{x}},{\bar{y}},{\bar{z}})$ be the KKT point of problem (1.4). Then, Any element in $\partial _{(x,y,z)}{\mathcal {K}}_{pert}(({\bar{x}},{\bar{y}},{\bar{z}}),(0,0,0))$ is nonsingular.

Proof

Since $\mathrm{Prox}_p$ is directionally differentiable, it follows from the chain rule presented in Sun (2006, Lemma 2.1) that for any ${\mathcal {G}}\in \partial _{(x,y,z)}{\mathcal {K}}_{pert}(({\bar{x}},{\bar{y}},{\bar{z}}),(0,0,0))$, there exist ${\mathcal {G}}_{f}\in \partial \mathrm{Prox}_{f}({\bar{x}}-{\bar{z}}-{\hat{s}})$ and ${\mathcal {G}}_{p}\in \partial \mathrm{Prox}_p({\bar{y}}+{\bar{z}})$ such that

$$\begin{aligned} {{\mathcal {G}}}(\Delta x,\Delta y,\Delta z)= \left( \begin{array}{c} \Delta x-{\mathcal {G}}_{f}(\Delta z+\Delta x)\\ \Delta y-{\mathcal {G}}_{p}(\Delta y+\Delta z)\\ \Delta x-\Delta y \end{array} \right) ,\,\,\,\forall \, (\Delta x,\Delta y,\Delta z)\in {\mathbb {S}}^n\times {\mathbb {S}}^n\times {\mathbb {S}}^n. \end{aligned}$$

Suppose that there exists $(\Delta x,\Delta y,\Delta z)\in {\mathbb {S}}^n\times {\mathbb {S}}^n\times {\mathbb {S}}^n$ such that ${{\mathcal {G}}}(\Delta x,\Delta y,\Delta z)=0$, i.e.,

$$\begin{aligned} \left\{ \begin{array}{l} \Delta x-{\mathcal {G}}_{f}(\Delta z+\Delta x)=0,\\ \Delta y-{\mathcal {G}}_{p}(\Delta y+\Delta z)=0,\\ \Delta x-\Delta y=0. \end{array} \right. \end{aligned}$$

(6.4)

It follows from Lemma 6.3 that both ${\mathcal {G}}_{f}$ and ${\mathcal {G}}^{-1}_{f}-I$ are self-adjoint and positive definite. This, together with (6.4), implies that

$$\begin{aligned} \Delta z = \left( {\mathcal {G}}^{-1}_{f}-I\right) \Delta x\,\,\,\hbox {and}\,\,\, \left( I-{\mathcal {G}}_p+{\mathcal {G}}_p\left( {\mathcal {G}}^{-1}_f-I\right) \right) \Delta x=0. \end{aligned}$$

(6.5)

We know from Lemma 6.4 that $(I-{\mathcal {G}}_p+{\mathcal {G}}_p({\mathcal {G}}^{-1}_f-I))$ is nonsingular. This, together with (6.5), implies that

$$\begin{aligned} \Delta x=0,\,\,\Delta y =0,\,\,\hbox {and}\,\, \Delta z =0. \end{aligned}$$

Therefore, ${\mathcal {G}}$ is nonsingular. The proof is completed. □

In order to give the proof of Proposition 2.1, we recall the implicit theorem from Clarke et al. (1998). Let ${\mathbb {X}}$ be a Hilbert space and ${\mathbb {M}}$ be a metric space. Consider the equation

$$\begin{aligned} {\mathcal {H}}(x,\alpha ) =0, \end{aligned}$$

where ${\mathcal {H}}$ is a mapping from ${\mathbb {X}}\times {\mathbb {M}}$ to ${\mathbb {X}}$. Assume that $V\subseteq {\mathbb {X}}$ is an open set such that ${\mathcal {H}}$ is continuous on $V\times {\mathbb {M}}$ and such that the partial derivative $\partial _x{\mathcal {H}}(x,\alpha )$ exists for all $(x,\alpha )\in V\times {\mathbb {M}}$, and is continuous jointly in $(x,\alpha )\in V\times {\mathbb {M}}$.

The following result is from Clarke et al. (1998, Theorem 3.6), which is usually named as Clarke’s implicit function theorem.

Lemma 6.6

Let $(x_0,\alpha _0)\in V\times {\mathbb {M}}$ be a point satisfying ${\mathcal {H}}(x_0,\alpha _0)=0$. Then one has

(a)
If $\partial _x{\mathcal {H}}(x_0,\alpha _0)$ is onto and one to one, then there exist neighborhoods ${\mathcal {N}}_x$ of $x_0$ and ${\mathcal {N}}_{\alpha }$ of $\alpha _0$ and a unique continuous function ${\hat{x}}(\cdot ):{\mathcal {N}}_{\alpha }\rightarrow {\mathcal {N}}_{x}$ with ${\hat{x}}(\alpha _0)=x_0$ such that ${\mathcal {H}}({\hat{x}}(\alpha ),\alpha )=0,\,\,\forall \alpha \in {\mathcal {N}}_{\alpha }$.
(b)
If in addition ${\mathcal {H}}$ is Lipschitz in a neighborhood of $(x_0,\alpha _0)$, then ${\hat{x}}$ is Lipschitz.

Now, we are ready to present the proof of Proposition 2.1.

Proof

The global Lipschitz continuities of the proximal mappings $\mathrm{Prox}_{f}$ and $\mathrm{Prox}_{p}$ imply that the mapping ${\mathcal {K}}_{pert}$ defined by (2.3) is Lipschitz continuous. Therefore, the proof can be completed by Lemmas 6.5, 6.6, and the fact that for any (u, v, w), the set $\mathsf{Sol}(u,v,w)$ must be a singleton if it is nonempty. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, N., Yang, J. Sparse precision matrix estimation with missing observations. Comput Stat 38, 1337–1355 (2023). https://doi.org/10.1007/s00180-022-01265-w

Download citation

Received: 02 December 2021
Accepted: 13 July 2022
Published: 26 July 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00180-022-01265-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse precision matrix estimation with missing observations

Abstract

Access this article

Similar content being viewed by others

Bayesian Estimation of the Precision Matrix with Monotone Missing Data

Robust and sparse regression in generalized linear model by stochastic optimization

Sparse Information Filter for Fast Gaussian Process Regression

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Lemma 2.1

Lemma 6.1

Lemma 6.2

Proof

1.2 Proof of Proposition 2.1

Lemma 6.3

Lemma 6.4

Proof

Lemma 6.5

Proof

Lemma 6.6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse precision matrix estimation with missing observations

Abstract

Access this article

Similar content being viewed by others

Bayesian Estimation of the Precision Matrix with Monotone Missing Data

Robust and sparse regression in generalized linear model by stochastic optimization

Sparse Information Filter for Fast Gaussian Process Regression

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Lemma 2.1

Lemma 6.1

Lemma 6.2

Proof

1.2 Proof of Proposition 2.1

Lemma 6.3

Lemma 6.4

Proof

Lemma 6.5

Proof

Lemma 6.6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation