Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks

Li, Miaoqi; Kang, Emily L.

doi:10.1007/s11222-019-09862-4

Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks

Published: 14 February 2019

Volume 29, pages 1165–1179, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

746 Accesses
1 Citation
Explore all metrics

Abstract

The spatial autoregressive (SAR) model is a classical model in spatial econometrics and has become an important tool in network analysis. However, with large-scale networks, existing methods of likelihood-based inference for the SAR model become computationally infeasible. We here investigate maximum likelihood estimation for the SAR model with partially observed responses from large-scale networks. By taking advantage of recent developments in randomized numerical linear algebra, we derive efficient algorithms to estimate the spatial autocorrelation parameter in the SAR model. Compelling experimental results from extensive simulation and real data examples demonstrate empirically that the estimator obtained by our method, called the randomized maximum likelihood estimator, outperforms the state of the art by giving smaller bias and standard error, especially for large-scale problems with moderate spatial autocorrelation. The theoretical properties of the estimator are explored, and consistency results are established.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Finite Sample Behavior of MLE in Network Autocorrelation Models

Generalized Linear Models Network Autoregression

Estimation of spatial autoregressive models with measurement error for large data sets

Article 08 November 2017

Thomas Suesse

References

Anselin, L., Bera, A.K.: Spatial dependence in linear regression models with an introduction to spatial econometrics. Stat. Textb. Monogr. 155, 237–290 (1998)
Google Scholar
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(4), 825–848 (2008)
Article MathSciNet MATH Google Scholar
Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data. CRC Press, Boca Raton (2014)
MATH Google Scholar
Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1–3), 41–54 (1999)
Article MathSciNet MATH Google Scholar
Beck, N., Gleditsch, K.S., Beardsley, K.: Space is more than geography: Using spatial econometrics in the study of political economy. Int. Stud. Q. 50(1), 27–44 (2006)
Article Google Scholar
Boutsidis, C., Drineas, P., Kambadur, P., Kontopoulou, E.M., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv preprint arXiv:1503.00374 (2015)
Browne, K.: Snowball sampling: using social networks to research non-heterosexual women. Int. J. Soc. Res. Methodol 8(1), 47–60 (2005)
Article Google Scholar
Burden, S., Cressie, N., Steel, D.G.: The SAR model for very large datasets: a reduced rank approach. Econometrics 3(2), 317–338 (2015)
Article Google Scholar
Chen, X., Chen, Y., Xiao, P.: The impact of sampling and network topology on the estimation of social intercorrelations. J. Market. Res. 50(1), 95–110 (2013)
Article Google Scholar
Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 209–226 (2008)
Article MathSciNet MATH Google Scholar
Darmofal, D.: Spatial Analysis for the Social Sciences. Cambridge University Press, Cambridge (2015)
Book Google Scholar
Doreian, P.: Estimating linear models with spatially distributed data. Sociol. Methodol. 12, 359–388 (1981)
Article Google Scholar
Doreian, P., Freeman, L., White, D., Romney, A.: Models of network effects on social actors. In: Research Methods in Social Network Analysis pp. 295–317 (1989)
Fujimoto, K., Chou, C.P., Valente, T.W.: The network autocorrelation model using two-mode data: affiliation exposure and potential bias in the autocorrelation parameter. Soc. Netw. 33(3), 231–243 (2011)
Article Google Scholar
Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp. 1207–1214 (2012)
Haggett, P.: Hybridizing alternative models of an epidemic diffusion process. Econ. Geogr. 52(2), 136–146 (1976)
Article Google Scholar
Lee, L.F.: Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72(6), 1899–1925 (2004)
Article MathSciNet MATH Google Scholar
Lee, L.F., Liu, X.: Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econ. Theory 26(1), 187–230 (2010)
Article MathSciNet MATH Google Scholar
Lee, L., Yu, J.: Estimation of spatial autoregressive panel data models with fixed effects. J. Econ. 154(2), 165–185 (2010)
Article MathSciNet MATH Google Scholar
Lf, L., Liu, X., Lin, X.: Specification and estimation of social interaction models with network structures. Econ. J. 13(2), 145–176 (2010)
MathSciNet Google Scholar
Leenders, R.T.: Modeling social influence through network autocorrelation: constructing the weight matrix. Soc. Netw. 24(1), 21–47 (2002)
Article Google Scholar
LeSage, J., Pace, R.K.: Introduction to Spatial Econometrics. Chapman and Hall, Boca Raton (2009)
Book MATH Google Scholar
LeSage, J.P., Pace, R.K.: Models for spatially dependent missing data. J. Real Estate Financ. Econ. 29(2), 233–254 (2004)
Article Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection (2014)
Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E.: Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72(3), 445–463 (2002)
Article Google Scholar
Lin, X., Lf, L.: Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ. 157(1), 34–52 (2010)
Article MathSciNet MATH Google Scholar
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
O’Malley, A.J.: The analysis of social network data: an exciting frontier for statisticians. Stat. Med. 32(4), 539–555 (2013)
Article MathSciNet Google Scholar
Ord, K.: Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 70(349), 120–126 (1975)
Article MathSciNet MATH Google Scholar
OSC: Ohio Supercomputer Center. Columbus, OH: Ohio Supercompu-ter Center. http://osc.edu/ark:/19495/f5s1ph73 (1987). Accessed 21 Dec 2018
Pace, R.K., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33(3), 291–297 (1997)
Article MATH Google Scholar
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Robins, G.: A tutorial on methods for the modeling and analysis of social network data. J. Math. Psychol. 57(6), 261–274 (2013)
Article MathSciNet MATH Google Scholar
Robins, G., Pattison, P., Elliott, P.: Network models for social influence processes. Psychometrika 66(2), 161–189 (2001)
Article MathSciNet MATH Google Scholar
Shao, J.: Mathematical Statistics. Springer, New York (2003)
Book MATH Google Scholar
Smirnov, O., Anselin, L.: Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Stat. Data Anal. 35(3), 301–319 (2001)
Article MathSciNet MATH Google Scholar
Smirnov, O.A.: Computation of the information matrix for models with spatial interaction on a lattice. J. Comput. Graph. Stat. 14(4), 910–927 (2005)
Article MathSciNet Google Scholar
Stewart, G.: Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numer. Math. 83(2), 313–323 (1999)
Article MathSciNet MATH Google Scholar
Suesse, T.: Estimation of spatial autoregressive models with measurement error for large data sets. Comput. Stat. 33(4), 1627–1648 (2018)
Article MathSciNet MATH Google Scholar
Suesse, T.: Marginal maximum likelihood estimation of SAR models with missing data. Comput. Stat. Data Anal. 120, 98–110 (2018)
Article MathSciNet MATH Google Scholar
Suesse, T., Chambers, R.: Using social network information for survey estimation. J. Off. Stat. 34(1), 181–209 (2018)
Article Google Scholar
Suesse, T., Zammit-Mangion, A.: Computational aspects of the em algorithm for spatial econometric models with missing data. J. Stat. Comput. Simul. 87(9), 1767–1786 (2017)
Article MathSciNet Google Scholar
Sun, D., Tsutakawa, R.K., Speckman, P.L.: Posterior distribution of hierarchical models using car (1) distributions. Biometrika 86(2), 341–350 (1999)
Article MathSciNet MATH Google Scholar
Wang, S., Luo, L., Zhang, Z.: SPSD matrix approximation vis column selection: theories, algorithms, and extensions. J. Mach. Learn. Res. 17(49), 1–49 (2016)
MathSciNet MATH Google Scholar
Wang, W., Lee, L.F.: Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econ. J. 16(1), 73–102 (2013)
MathSciNet Google Scholar
Whittle, P.: On stationary processes in the plane. Biometrika 41, 434–449 (1954)
Article MathSciNet MATH Google Scholar
Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10(1–2), 1–157 (2014)
Zhou, J., Tu, Y., Chen, Y., Wang, H.: Estimating spatial autocorrelation with sampled network data. J. Bus. Econ. Stat. 35(1), 130–138 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the associate editor and two reviewers of Statistics and Computing for their insightful comments that greatly improved this work. Li’s work is partially supported by the Henry Laws Fellowship Award and the Taft Research Center at the University of Cincinnati. Kang’s research is partially supported by the Simons Foundation Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). We would like to thank Dr. Shan Ba, Dr. Won Chang, Dr. Noel Cressie, Dr. Alex B. Konomi, and Dr. Siva Sivaganesan for their helpful suggestions.

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, USA
Miaoqi Li & Emily L. Kang

Authors

Miaoqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Emily L. Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emily L. Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

This section contains the proofs of theorems and lemmas for the paper.

1.1 A.1 Proof of Theorem 1

To prove Theorem 1, we first need to state and prove four lemmas.

Lemma 1

Assume $\Vert \varOmega _{22}^{-1} \Vert _F $ is bounded for any n and N with $n < N$, then $\Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F $ is also bounded.

Proof

Since $ \varOmega _{22} $ is a symmetric positive semi-definite matrix (SPSD), all its eigenvalues, $ \tau _i's, $ are nonnegative. Let $ \tau _\mathrm{min}$ be the smallest eigenvalue of $\varOmega _{22}. $ Since $\Vert \varOmega _{22}^{-1} \Vert _F $ is bounded, there exists $ M_1 > 0,$ such that

$$\begin{aligned} \Vert \varOmega _{22}^{-1} \Vert _F= & {} \{ \text {tr} [ ( \varOmega _{22}^{-1} )^{T} \varOmega _{22}^{-1} ] \} ^ {1/2} = \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\tau _i^2} \right) ^{1/2} \\\le & {} \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\tau _\mathrm{min}^2} \right) ^{1/2} = \left( \frac{N-n}{ \tau _\mathrm{min}^2} \right) ^{1/2} \\= & {} \frac{(N-n)^{1/2}}{\tau _\mathrm{min}} = M_1. \end{aligned}$$

Since $n<N$, $\tau _\mathrm{min} \ne 0.$ And therefore, $\tau _i >0,$ where $i = 1, \dots , N-n.$ Thus, $ \varOmega _{22} $ is a positive definite matrix (SPD). From Theorem 6 of Wang et al. (2016), $ {\tilde{ \varOmega }_{22}} ^{ss} $ is also SPD; thus, all its eigenvalues, $\sigma _i's$, are positive, where $i=1,\dots ,N-n.$ Let $\sigma _\mathrm{min}$ be the smallest eigenvalue of $ {\tilde{ \varOmega }_{22}} ^{ss} $. Then, there exists $ \displaystyle M_2 = \frac{(N-n)^{1/2}}{\sigma _\mathrm{min}} > 0,$ such that

$$\begin{aligned} \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F= & {} \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\sigma _i^2} \right) ^{1/2} \le \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\sigma _\mathrm{min}^2} \right) ^{1/2} \\= & {} \left( \frac{N-n}{ \sigma _\mathrm{min}^2} \right) ^{1/2} = \frac{(N-n)^{1/2}}{\sigma _\mathrm{min}} = M_2. ~~~ \end{aligned}$$

$\square $

Lemma 2

For any $ \epsilon > 0,$ there exists $\delta > 0$ such that for any $|a-b| < \delta $, we have $| \log a - \log b | < \epsilon $.

Proof

Taylor series of $ f(x) = \log x$ at $x=x_0$ is $ \log x = \log x_0 + \sum \nolimits _{t=1}^{\infty } \frac{(-1)^{t+1}}{t} (1-\frac{x_0}{x})^t.$ Thus,

$$\begin{aligned} | \log a - \log b |= & {} \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t+1}}{t} \left( 1-\frac{b}{a}\right) ^t \right| \\= & {} \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t-1}}{t} \left( 1-\frac{b}{a}\right) ^{t-1} \frac{1}{a} (a-b) \right| \\= & {} |a-b| \times \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t-1}}{ta} \left( 1-\frac{b}{a}\right) ^{t-1} \right| \\< & {} \delta \times \left| \displaystyle \sum _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} \left( 1-\frac{b}{a}\right) ^{t-1} \right| . \end{aligned}$$

Based on Leibniz’s theorem for convergence of an infinite series, $ \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} u_t $, where $u_t >0$ converges, must have the following two conditions being satisfied: 1. $u_t \ge u_{t+1},$ for all $t \ge N, N \in \mathcal {N} $; 2. $ {\lim \nolimits _{t \rightarrow \infty }} u_t = 0.$

In our case, $ \displaystyle u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1}. $ Let us check those conditions one after another.

Condition 0, for any $\displaystyle a>b>0, u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1} > 0.$

Condition 1, for any $\displaystyle a>b>0, - \frac{b}{a} \le \frac{1}{t}. $ Thus, we have $ \displaystyle \frac{1}{ta} (1-\frac{b}{a})^{t-1} \ge \frac{1}{(t+1)a} (1-\frac{b}{a})^t, $ which implies $u_t \ge u_{t+1}.$

Condition 2, for any $ a>b>0, {\lim \nolimits _{t \rightarrow \infty }} u_t = {\lim \nolimits _{t \rightarrow \infty }} \frac{1}{ta} (1-\frac{b}{a})^{t-1} = 0. $

Since all conditions are satisfied, there exists $S>0$ such that $ | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S. $ Let $ \displaystyle \delta = \frac{\epsilon }{S}$. Then, for any $\epsilon > 0, | \log a - \log b | < \delta \times | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = \delta \times S = \frac{\epsilon }{S} \times S = \epsilon .$

Note: $ \displaystyle S < | (-1)^{1-1} \frac{1}{1 \times a} (1 - \frac{b}{a})^{1-1} | = \frac{1}{a} $ since the alternating infinite series is descending in absolute value. $\square $

Lemma 3

Let $ A$ be an $ n \times n$ matrix. If there exists $M > 0$ such that $ \Vert A\Vert _F^2 < M \le n $, then for any $k = 1, \ldots , m $ , we have $ \Vert A^k \Vert _F^2 < M,$ where m is any fixed integer.

Proof

Let $ \nu _i's$, $i = 1, \ldots , n$, denote the eigenvalues of matrix $ A$. Since $ \Vert A\Vert _F^2 = \text {tr} (A^T A) = \sum \nolimits _{i=1}^n \nu _i^2 \le n |\nu _\mathrm{max} |^2 < M, $ then we have

$$\begin{aligned} \Vert A^k \Vert _F^2= & {} \text {tr} [ (A^k)^T A^k ] = \displaystyle \sum _{i=1}^n (\nu _i^k)^2 \le n ( |\nu _\mathrm{max}|^k )^2 \\= & {} n ( |\nu _\mathrm{max}|^2 )^k \le n \Big (\frac{M}{n} \Big )^k = \Big (\frac{M}{n} \Big )^{k-1} M \le M. ~~ \end{aligned}$$

$\square $

Lemma 4

Let $A_{n \times n} = \{a_{ij} \}_{n \times n},\ B_{n \times n} = \{ b_{ij} \}_{n \times n}$, and assume $ \Vert A\Vert _F^2< M = O(1) \le n, \ \Vert B\Vert _F^2 < M = O(1) \le n,$ then for any $\epsilon _0 > 0,$ there exists $\delta _0 > 0, $ such that for any $ A, B$ with $ \Vert A- B\Vert _F^2 < \delta _0, $ we have $ \Vert A^h - B^h \Vert _F^2 < \epsilon _0, $ where $h=1, \ldots , m,$ and m is any fixed integer.

Proof

By using mathematical induction, we need to prove:

Step 1 If $ \Vert A- B\Vert _F^2 < \delta _0 $, then $ \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. $
Step 2 For any $ h = 1, \dots , m-1,$ if $ \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0 $, then $ \Vert A^{h} - B^{h} \Vert _F^2 < \epsilon _0. $

We first prove Step 1. For any $i, j = 1, \dots , n, ~~ | a_{ij} - b_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | a_{ij} - b_{ij} | \le \Vert A- B\Vert _F < \delta _0^{1/2} $ .

$ A^2 = A\times A= \left( \sum \nolimits _{k=1}^n a_{ik} a_{kj}\right) _{n \times n} $, and $ B^2 = B\times B= \left( \sum \nolimits _{k=1}^n b_{ik} b_{kj}\right) _{n \times n} $

$$\begin{aligned}&\Vert A^2 - B^2 \Vert _F^2\nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} a_{kj} - b_{ik} b_{kj} ) \right] ^2 \nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} a_{kj} - a_{ik} b_{kj} ) + \sum _{k=1}^n (a_{ik}b_{kj} - b_{ik} b_{kj} ) \right] ^2 \nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (a_{k_1j} - b_{k_1j} ) a_{ik_2} ( a_{k_2j} - b_{k_2j} ) \right] \right. \nonumber \\&\qquad \left. + \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ (a_{ik_1} - b_{ik_1}) b_{k_1j} (a_{ik_2} - b_{ik_2}) b_{k_2j} \right] \right. \nonumber \\&\qquad \left. +\, 2 \displaystyle \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (a_{k_1j} - b_{k_1j}) (a_{ik_2} - b_{ik_2}) b_{k_2j} \right] \right\} \nonumber \\&\quad \le \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 + \sum _{k_1=1}^n \sum _{k_2=1}^n b_{k_1j} b_{k_2j} \delta _0 \right. \nonumber \\&\qquad \left. +\, 2 \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} b_{k_2j} \delta _0 \right\} \end{aligned}$$

(8)

Let $ A^T A= \{ t_{ij} \}_{n \times n} $ and let $\text {s}(A^T A) := \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n t_{ij} , $ then we have the following inequality:

$$\begin{aligned}&\displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 \right] \\&\quad \le \frac{\delta _0}{2} \sum \nolimits _{i=1}^n \sum _{j=1}^n \left[ \sum _{k_1=1}^n \sum _{k_2=1}^n ( a_{ik_1}^2 + a_{ik_2}^2 ) \right] \\&\quad = \frac{\delta _0}{2} \left[ \sum _{j=1}^n \sum _{k_2=1}^n \left( \sum _{i=1}^n \sum _{k_1=1}^n a_{ik_1}^2 \right) + \sum _{j=1}^n \sum _{k_1=1}^n \left( \sum _{i=1}^n \sum _{k_2=1}^n a_{ik_2}^2 \right) \right] \\&\quad < \frac{\delta _0}{2} [ n^2 M + n^2 M ] = n^2 M \delta _0. \end{aligned}$$

Similarly, we can get $\sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n b_{ik_1} b_{ik_2} \delta _0\right] < n^2 M \delta _0 $ , and $ 2 \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n a_{ik_1} b_{k_2 j} \delta _0 \right] < 2 n^2 M \delta _0 . $ Thus, from (8), we can derive $ \Vert A^2 - B^2 \Vert _F^2 < 4 n^2 M \delta _0. $

Hence, for any $\epsilon _0 > 0,$ there exists $ \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,$ such that for any $\Vert A- B\Vert _F^2 < \delta _0, $ we have $ \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. \text { (End of proving Step 1.) } $

We now prove Step 2.

Let $ A^{h-1} = \{ c_{ij} \}_{n \times n}, B^{h-1} = \{ d_{ij} \}_{n \times n}. $

$ \forall i, j = 1, \dots , n, ~~ | c_{ij} - d_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | c_{ij} - d_{ij} | \le \Vert A^{h-1} - B^{h-1} \Vert _F < \delta _0^{1/2}. $

$ A^h = A\times A^{h-1} = \left( \sum \nolimits _{k=1}^n a_{ik} c_{kj}\right) _{n \times n} $, and $ B^h = B\times B^{h-1} = \left( \sum \nolimits _{k=1}^n b_{ik} d_{kj}\right) _{n \times n} $

Similar to Proof of Step 1, we have

$$\begin{aligned}&\Vert A^h - B^h \Vert _F^2\\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} c_{kj} - b_{ik} d_{kj} ) \right] ^2\\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (c_{k_1j} - d_{k_1j} ) a_{ik_2} ( c_{k_2j} - d_{k_2j} ) \right] \right. \\&\left. \qquad + \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ (a_{ik_1} - b_{ik_1}) d_{k_1j} (a_{ik_2} - b_{ik_2}) d_{k_2j} \right] \right. \\&\left. \qquad + 2 \displaystyle \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (c_{k_1j} - d_{k_1j}) (a_{ik_2} - b_{ik_2}) d_{k_2j} \right] \right\} \\&\quad \le \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 + \sum _{k_1=1}^n \sum _{k_2=1}^n d_{k_1j} d_{k_2j} \delta _0 \right. \\&\left. \qquad + 2 \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} b_{k_2j} \delta _0 \right\} \\&\quad < 4 n^2 M \delta _0, \end{aligned}$$

Hence, for any $\epsilon _0 > 0,$ there exists $ \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,$ such that for any $A, B$ with $ \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0, $ we have $ \Vert A^h - B^h \Vert _F^2 < \epsilon _0. $ (End of proving Step 2 and Lemma 4) $\square $

Proof of Theorem 1

Let $\alpha > \lambda _1 ( \varSigma _{0}^{-1} )$, where $ \lambda _1 ( \varSigma _{0}^{-1} )$ is the largest eigenvalue of $ \varSigma _{0}^{-1} $. Then, the exact loglikelihood can be derived using matrix Taylor expansion.

$$\begin{aligned} \log f_{Y_\mathrm{O}} (\rho )= & {} - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} + \frac{1}{2} \log |\varSigma _{0}^{-1}| \\= & {} - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} \\&+ \frac{1}{2} \left[ n \log (\alpha ) - \displaystyle \sum _{k=1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right] \\:= & {} T_1 + T_2 , \end{aligned}$$

where $\varSigma _0^{-1} \equiv \varOmega _{11} -\varOmega _{12} \varOmega _{22}^{-1} \varOmega _{21} $.

And the approximated loglikelihood using RMLE method can be written in the following way.

$$\begin{aligned}&\log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho )\\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1} Y_\mathrm{O} + \frac{1}{2} \log | \widetilde{\varSigma }_{0} ^{-1} | \\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1} Y_\mathrm{O} \\&\qquad + \frac{1}{2} \left\{ n \log (\alpha ) {-} \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n {-} \frac{ {\widetilde{\varSigma }}_{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right\} \\&\quad := \widetilde{T}_1 + \widetilde{T}_2 , \end{aligned}$$

where $ \widetilde{\varSigma }_0 ^{-1} \equiv \varOmega _{11} -\varOmega _{12} ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \varOmega _{21} $.

To complete the proof of Theorem 1, under some assumptions and regularity conditions, we need to prove the following two steps.

Step 1 As $ \displaystyle N \rightarrow \infty , n \rightarrow \infty $ and $ \displaystyle \frac{n}{N} = c$, $|\widetilde{T}_1 - T_1 | = o(n^{-1/2})$.
Step 2 As $ \displaystyle N \rightarrow \infty , n \rightarrow \infty , \frac{n}{N} = c,$ and $ p,m \rightarrow \infty $, $|\widetilde{T}_2 - T_2 | = o(n^{-1/2})$.

We first prove Step 1, $|\widetilde{T}_1 - T_1 | = o(n^{-1/2})$. From Theorem 9 of Wang et al. (2016), we can have the following result.

$$\begin{aligned}&\Vert \varOmega _{22} - \varOmega _{22}^{ss} \Vert _F\\&\quad \le \left\{ \eta \left( \Vert \varOmega _{22} - \varOmega _{22,k} \Vert _F^2 - \frac{ \left[ \sum _{i=k+1}^{N-n} \tau _i (\varOmega _{22}) \right] ^2 }{ N-n-k } \right) \right\} ^{1/2} := \epsilon _1, \end{aligned}$$

where define $\displaystyle \eta = \frac{ \sum _{i=1}^{k} \tau _i^2 (\varOmega _{22}) }{\sum _{i=1}^{N-n} \tau _i^2 (\varOmega _{22}) } = \frac{\Vert \varOmega _{22,k} \Vert _F^2}{\Vert \varOmega _{22} \Vert _F^2} $, where $\tau _i $ is the eigenvalue of $ \varOmega _{22}, i=1, \dots , N-n $ and $ \varOmega _{22,k} $ is the best rank k approximation to $ \varOmega _{22}. $ Here, we assume $\epsilon _1 = o(n^{-7/2}). $

Under Assumption 1, $\Vert \varOmega _{22}^{-1} \Vert _F $ is bounded, then there exists $ \displaystyle M_1 = \frac{ (N-n)^{1/2}}{\tau _\mathrm{min}} > 0$ such that $\Vert \varOmega _{22}^{-1} \Vert _F \le M_1 $. By lemma 1, there exists $ \displaystyle M_2 = \frac{ (N-n)^{1/2} }{\sigma _\mathrm{min}} > 0$ such that $ \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \le M_2,$ where $ \tau _\mathrm{min}$ and $\sigma _\mathrm{min}$ are the smallest eigenvalues of $\varOmega _{22}$ and $ {\tilde{ \varOmega }_{22}} ^{ss} $ , respectively.

Hence, we have

$$\begin{aligned} \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F= & {} \Vert \varOmega _{22}^{-1}( \varOmega _{22} - {{\tilde{ \varOmega }_{22}} ^{ss}}) ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \\\le & {} \Vert \varOmega _{22}^{-1} \Vert _F \Vert \varOmega _{22} {-} {{\tilde{ \varOmega }_{22}} ^{ss}} \Vert _F \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \\< & {} M_1 \epsilon _1 M_2 . \end{aligned}$$

Further, we can derive

$$\begin{aligned}&| Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1}Y_\mathrm{O} - Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} |\\&\quad = \Vert Y_\mathrm{O} ^{T} (\varOmega _{11} -\varOmega _{12} ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \varOmega _{21}) Y_\mathrm{O} \\&\qquad -\,Y_\mathrm{O} ^{T} (\varOmega _{11} -\varOmega _{12} \varOmega _{22}^{-1} \varOmega _{21} )Y_\mathrm{O} \Vert _F \\&\quad = \Vert Y_\mathrm{O} ^{T} \varOmega _{12} (\varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} )\varOmega _{21} Y_\mathrm{O} \Vert _F \\&\quad \le \Vert Y_\mathrm{O}^{T} \varOmega _{12} \Vert _F \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \Vert \varOmega _{21} Y_\mathrm{O} \Vert _F \\&\quad < c_1 M_1 \epsilon _1 M_2 c_1. \end{aligned}$$

Note that $c_1$ is bounded as $ N \rightarrow \infty $ and $ n \rightarrow \infty $ based on Assumption 2.

Let $\hbox {a} = \max \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} $ and $ b = \min \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} $. By Lemma 2, there exists $S_1 >0$, such that $ | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S_1. $ Notice that neither $ \widetilde{ \varSigma }_{0}^{-1} $ nor $ \varSigma _{0} ^{-1} $ is sparse since $ ( {{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} $ or $ { \varOmega }_{22} ^{-1} $ is not sparse. Without loss of generality, let us use $ D_{n \times n} = \{ d_{ij} \}_{n \times n} $ to denote either $ \widetilde{\varSigma }_{0}^{-1} $ or $ \varSigma _{0} ^{-1} $. Then, $ Y_\mathrm{O} ^{T} DY_\mathrm{O} = \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n d_{ij} y_i y_j = O(n^2). $

And thus we have

$$\begin{aligned} \big | \widetilde{T}_1 - T_1 \big |= & {} \frac{n}{2} \ \Big | \log (Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}) - \log (Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O}) \Big | \\< & {} \frac{n}{2} c_1 M_1 \epsilon _1 M_2 c_1 S_1 \\= & {} \frac{n}{2} c_1 \frac{ (N-n)^{1/2} }{\tau _\mathrm{min}} \epsilon _1 \frac{ (N-n)^{1/2} }{\sigma _\mathrm{min}} c_1 S_1 \\< & {} \frac{n}{2} c_1^2 \frac{N-n}{\tau _\mathrm{min} \sigma _\mathrm{min} } \epsilon _1 \frac{1}{a} \\= & {} \frac{n}{2} c_1^2 \frac{n/c - n}{\tau _\mathrm{min} \sigma _\mathrm{min} } \epsilon _1 \frac{1}{a} \\< & {} o(n^{-1/2}). \end{aligned}$$

We now prove Step 2, $|\widetilde{T}_2 - T_2 | = o(n^{-1/2})$. To complete the proof, we need to first introduce a transitional loglikelihood.

$$\begin{aligned}&\log _\mathrm{Tansit} f_{Y_\mathrm{O}} (\rho )\\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} \\&\qquad +\, \frac{1}{2} \left\{ n \log (\alpha ) {-} \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n {-} \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right\} \\&\quad := T_1 + \widehat{T}_2 , \end{aligned}$$

$ | \widetilde{T}_2 - T_2 | = | \widetilde{T}_2 - \widehat{T}_2 + \widehat{T}_2 - T_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2 | $. Next, we need to prove $ | \widehat{T}_2 - T_2 | = o(n^{-1/2}) $ and $ | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{-1/2}) $ , respectively.

We first prove $ | \widehat{T}_2 - T_2 | = o(n^{-1/2})$ as follows:

$$\begin{aligned} | \widehat{T}_2 - T_2 |= & {} | {\widehat{logdet}} (\varSigma _0 ^{-1}) - logdet (\varSigma _0 ^{-1}) | \\= & {} \frac{1}{2} \ \left| \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n - \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right. \\&\left. - \displaystyle \sum _{k=1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \\\le & {} \frac{1}{2} \left\{ \left| \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n - \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right. \right. \\&\left. - \displaystyle \sum _{k=1}^m \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \\&\left. + \left| \displaystyle \sum _{k=m+1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \right\} \\= & {} \frac{1}{2} \ \left( \varGamma _1 + \varGamma _2 \right) . \\ \end{aligned}$$

From Lemma 7 in Boutsidis et al. (2015), we can get:

1.
With $\delta = 0.01, p = 20 \ln (2/\delta ) / \epsilon ^2$, and $ \epsilon = o(n^{-3/2})$, with probability at least 0.99, $\varGamma _1 \le \epsilon \times \text {tr} [\sum \nolimits _{k=1}^\infty ( I_n - \frac{\varSigma _0^{-1}}{\alpha } )^k / k ] .$
2.
Let $ \displaystyle \kappa ( \varSigma _0^{-1} ) = \frac{\lambda _1(\varSigma _0^{-1})}{ \lambda _n (\varSigma _0^{-1}) } \ge 1 $ and $ \epsilon = o(n^{-3/2}),$ where $ \lambda _i(\varSigma _0^{-1}) $ denotes the ith largest eigenvalue of $\varSigma _0^{-1}$. From Boutsidis et al. (2015), we set
$$\begin{aligned} m= & {} O \bigg [ \log \bigg ( \frac{ log (\kappa (\varSigma _0^{-1}) ) }{2 \epsilon \log (5 \ \kappa ( \varSigma _0^{-1} ) ) } \bigg ) \times \kappa (\varSigma _0^{-1}) \bigg ] \\= & {} O \Big [ \log \left( \frac{1}{2 \epsilon } \right) \Big ] = O \bigg [ \log \bigg ( \frac{1}{2 \times o(n^{- 3/2})} \bigg ) \bigg ] \\= & {} O \big ( \log \big [ O (n^{3/2}) \big ] \big ) = O \big ( \ O \big ( n^{ 3 \bar{\epsilon } /2 } \big ) \ \big ) \le O \big ( n^{3/2 } \big ), \end{aligned}$$
where $ \bar{\epsilon } $ is fixed and $ 0< \bar{\epsilon } < 1. $ Then, we can get
$$\begin{aligned} \varGamma _2\le & {} \left[ 1 - \frac{\lambda _n ( \varSigma _0 ^{-1})}{\alpha } \right] ^ m \times \displaystyle \sum _{k=1}^\infty \frac{1}{k} \ \text {tr} \left[ \left( I_n - \frac{\varSigma _0^{-1} }{ \alpha } \right) ^k \right] \\\le & {} \epsilon \times \displaystyle \sum _{i=1}^n \log \bigg ( \frac{\alpha }{\lambda _i (\varSigma _0^{-1} )} \bigg ) = o \big (n^{- 3/2 } \big ) \times O (n) \\= & {} o(n^{- 1/2 }). \end{aligned}$$

Hence, $ | \widehat{T}_2 - T_2 | = o(n^{- 1/2 })$.

We then prove $ | \widetilde{T}_2 - \widehat{T}_2 | \longrightarrow 0 . $

$$\begin{aligned}&| \widetilde{T}_2 - \widehat{T}_2 | = | \ {\widehat{logdet}} ( \widetilde{ \varSigma }_0^{-1} ) - {\widehat{logdet}} ( \varSigma _0^{-1} ) \ | \\&\quad = \frac{1}{2} \displaystyle \sum _{k=1}^m \frac{1}{k} \displaystyle \sum _{i=1}^p \frac{1}{p} g_i^T \left[ \left( I_n {-} \frac{\varSigma _0^{-1}}{\alpha } \right) ^k - \left( I_n {-} \frac{\widetilde{\varSigma }_0^{-1}}{\alpha } \right) ^k \right] g_i. \end{aligned}$$

Let $ \displaystyle A= I_n - \frac{\varSigma _0^{-1}}{\alpha } $, $ \displaystyle B= I_n - \frac{ \widetilde{\varSigma }_0^{-1}}{\alpha } $, and $ A^k - B^k = Q= \{q_{k_1 k_2} \}_{n \times n}. $

Then, $ | \widetilde{T}_2 - \widehat{T}_2 | $ can be written as

$$\begin{aligned} | \widetilde{T}_2 - \widehat{T}_2 | = \displaystyle \frac{1}{2} \sum _{k=1}^m \frac{1}{k} \bigg ( \sum _{i=1}^p \frac{1}{p} g_i^T Qg_i \bigg ) . \end{aligned}$$

Then, we have

$$\begin{aligned}&\Vert A- B\Vert _F ^2\\&\quad = \displaystyle \frac{1}{\alpha ^2} \Vert \varSigma _0^{-1} - \widetilde{ \varSigma }_0^{-1} \Vert _F^2\\&\quad = \frac{1}{\alpha ^2} \Vert \ \varOmega _{12} \big (\varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \big ) \varOmega _{21} \ \Vert _F^2 \\&\quad \le \displaystyle \frac{1}{\alpha ^2} \Vert \varOmega _{12} \Vert _F^2 \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F^2 \Vert \varOmega _{21} \Vert _F^2 \le \frac{1}{\alpha ^2} d_1 \epsilon _1^2 d_1. \end{aligned}$$

By Lemma 4, we have that for any $ \displaystyle k = 1, \ldots , m, \Vert A^k - B^k \Vert _F^2 < \frac{1}{\alpha ^2} d_1^2 \epsilon _1^2 4 n^2 M = o(n^{-5}), $ where m is a fixed integer. Then $ \Vert Q\Vert _F \ge \Vert Q\Vert _2 \ge \frac{1}{n^{1/2}} \Vert Q\Vert _1 = \frac{1}{ n^{1/2} } \ \underset{1 \le j \le n}{max} \sum \nolimits _{i=1}^n |q_{ij} | \ge \frac{1}{ n^{1/2} } \sum _{i=1}^n \sum _{j=1}^n | q_{ij} |. $

Thus, for any $g_i = ( g_{i1}, \dots , g_{in}) \in \text {Multivariate Normal}(0, I_n),$ we have

$$\begin{aligned} g_i^T Qg_i = \displaystyle \sum _{k_1 = 1}^n \sum _{k_2 = 1}^n g_{ik_1} g_{ik_2} q_{k_1 k_2} \le n^{1/2} \Vert Q\Vert _F = o(n^{-2}). \end{aligned}$$

Further, for any $p \ge 1, \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i = o(n^{-2})$. Then, for any $ m = O(n^{3/2}), \ \sum \nolimits _{k=1}^m \frac{1}{k} \ ( \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i ) = o(n^{- 1/2 })$. Hence, $ | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{- 1/2 }). $

Therefore, $ | \widetilde{T}_2 - {T}_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2| = o(n^{- 1/2 }). \text { (End of proving Step 2.)} $

Lastly, we have

$$\begin{aligned}&| \ \log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho ) - \log f_{Y_\mathrm{O}} (\rho ) \ | \\&\quad = | \ (\widetilde{T}_1 - T_1 ) + (\widetilde{T}_2 - T_2 ) \ | \\&\quad \le | \ \widetilde{T}_1 - T_1 \ | + | \ \widetilde{T}_2 - \widehat{T}_2 \ | + | \ \widehat{T}_2 - T_2 \ | \\&\quad = o(n^{- 1/2 }). ~~~ {\text {(End of proving Theorem}}~1. {\text {)}} \end{aligned}$$

$\square $

1.2 A.2 Proof of Theorem 2

Proof

For simplicity, we denote $ \log f_{Y_\mathrm{O}} (\rho )$ as $l (\rho )$ , the exact loglikelihood, and $\log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho ) $ as $ \tilde{l} (\rho )$, the approximated loglikelihood using RMLE method, whose first derivatives can be written as $ l^{'} (\rho ) $ and $ \tilde{l} ^{'} (\rho )$ and second derivatives can be written as $ l^{''} (\rho ) $ and $ \tilde{l} ^{''} (\rho )$, respectively.

Then, from Theorem 1 and Assumption 6, we are able to derive the following:

$$\begin{aligned}&n^{1/2} \ [ l^{'} (\rho ) - \tilde{l}^{'} (\rho ) ] = n^{1/2} \ \frac{d [l(\rho ) - \tilde{l} (\rho ) ]}{ d \rho } = o(1),\\&n^{1/2} \ [ l^{''} (\rho ) - \tilde{l}^{''} (\rho ) ] = n^{1/2} \ \frac{d^2 [l (\rho ) - \tilde{ l } (\rho ) ]}{ d \rho ^2} = o(1). \end{aligned}$$

By conducting Taylor expansions of $ l^{'} (\rho ) $ and $ \tilde{l} ^{'} (\rho )$ at point $\rho = \rho _1$, we can get:

$$\begin{aligned}&l^{'} (\hat{\rho }_\mathrm{MLE}) = l^{'} (\rho _1) + l^{''} (\rho _1) (\hat{\rho }_\mathrm{MLE} - \rho _1 ) + O(1) \overset{\mathrm{set}}{=} 0,\\&\tilde{l}^{'} (\hat{\rho }_\mathrm{RMLE}) = \tilde{l}^{'} (\rho _1) + \tilde{ l}^{''} (\rho _1) (\hat{\rho }_\mathrm{RMLE} - \rho _1 ) + O(1) \overset{\mathrm{set}}{=} 0. \end{aligned}$$

By solving the equations, we can obtain:

$$\begin{aligned}&\hat{\rho }_\mathrm{MLE} = \rho _1 - \frac{ l^{'} (\rho _1) }{ l^{''} (\rho _1) } + O \left( \frac{1}{n} \right) ,\\&\hat{\rho }_\mathrm{RMLE} = \rho _1 - \frac{ \tilde{l}^{'} (\rho _1) }{ \tilde{ l}^{''} (\rho _1) } + O \left( \frac{1}{n} \right) . \end{aligned}$$

Hence, we can obtain the following

$$\begin{aligned}&| \hat{\rho }_\mathrm{MLE} - \hat{\rho }_\mathrm{RMLE} |\\&\quad \le \Big | \frac{l^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{ \tilde{l}^{'} (\rho _1)}{ \tilde{ l^{''} } (\rho _1) } \Big | + O \left( \frac{1}{n} \right) \\&\quad \le \Big | \frac{l^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{\tilde{ l}^{'} (\rho _1)}{ l^{''} (\rho _1) } \Big | + \Big | \frac{\tilde{ l}^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{ \tilde{l}^{'} (\rho _1)}{ \tilde{ l}^{''} (\rho _1) } \Big | + O \left( \frac{1}{n} \right) \\&\quad = \Big | \frac{ l^{'}(\rho _1) - \tilde{l}^{'} (\rho _1) }{ l^{''} (\rho _1)} \Big | + | \tilde{l}^{'} (\rho _1) | \times \Big | \frac{ \tilde{l}^{''} (\rho _1) - l^{''} (\rho _1 )}{ l^{''} (\rho _1) \ \tilde{l}^{''} (\rho _1) } \Big | \\&\qquad + O \left( \frac{1}{n} \right) \\&\quad = o(n^{- 1/2 }). \end{aligned}$$

On the other hand, from asymptotic theory of MLE, we have

$$\begin{aligned} n^{1/2} ( \hat{\rho }_\mathrm{MLE} - \rho _0 ) \overset{d}{\longrightarrow } N ( 0, \frac{1}{I(\rho _0)} ) ~~ \text {converge in distribution.} \end{aligned}$$

Hence,

$$\begin{aligned}&n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \rho _0 )\\&\quad = n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \hat{\rho }_\mathrm{MLE} + \hat{\rho }_\mathrm{MLE} - \rho _0 ) \\&\quad = n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \hat{\rho }_\mathrm{MLE} ) + n^{1/2} ( \hat{\rho }_\mathrm{MLE} - \rho _0 ) \\&\quad \overset{d}{\longrightarrow } N \Big ( 0, \frac{1}{I(\rho _0)} \Big ) ~~ \text {converge in distribution.} \end{aligned}$$

(End of proving Theorem 2.) $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Kang, E.L. Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks. Stat Comput 29, 1165–1179 (2019). https://doi.org/10.1007/s11222-019-09862-4

Download citation

Received: 26 August 2018
Accepted: 04 February 2019
Published: 14 February 2019
Issue Date: 11 September 2019
DOI: https://doi.org/10.1007/s11222-019-09862-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks

Abstract

Access this article

Similar content being viewed by others

Finite Sample Behavior of MLE in Network Autocorrelation Models

Generalized Linear Models Network Autoregression

Estimation of spatial autoregressive models with measurement error for large data sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 A.1 Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 1

1.2 A.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks

Abstract

Access this article

Similar content being viewed by others

Finite Sample Behavior of MLE in Network Autocorrelation Models

Generalized Linear Models Network Autoregression

Estimation of spatial autoregressive models with measurement error for large data sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 A.1 Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 1

1.2 A.2 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation