Skip to main content
Log in

Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The spatial autoregressive (SAR) model is a classical model in spatial econometrics and has become an important tool in network analysis. However, with large-scale networks, existing methods of likelihood-based inference for the SAR model become computationally infeasible. We here investigate maximum likelihood estimation for the SAR model with partially observed responses from large-scale networks. By taking advantage of recent developments in randomized numerical linear algebra, we derive efficient algorithms to estimate the spatial autocorrelation parameter in the SAR model. Compelling experimental results from extensive simulation and real data examples demonstrate empirically that the estimator obtained by our method, called the randomized maximum likelihood estimator, outperforms the state of the art by giving smaller bias and standard error, especially for large-scale problems with moderate spatial autocorrelation. The theoretical properties of the estimator are explored, and consistency results are established.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Anselin, L., Bera, A.K.: Spatial dependence in linear regression models with an introduction to spatial econometrics. Stat. Textb. Monogr. 155, 237–290 (1998)

    Google Scholar 

  • Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(4), 825–848 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  • Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1–3), 41–54 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Beck, N., Gleditsch, K.S., Beardsley, K.: Space is more than geography: Using spatial econometrics in the study of political economy. Int. Stud. Q. 50(1), 27–44 (2006)

    Article  Google Scholar 

  • Boutsidis, C., Drineas, P., Kambadur, P., Kontopoulou, E.M., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv preprint arXiv:1503.00374 (2015)

  • Browne, K.: Snowball sampling: using social networks to research non-heterosexual women. Int. J. Soc. Res. Methodol 8(1), 47–60 (2005)

    Article  Google Scholar 

  • Burden, S., Cressie, N., Steel, D.G.: The SAR model for very large datasets: a reduced rank approach. Econometrics 3(2), 317–338 (2015)

    Article  Google Scholar 

  • Chen, X., Chen, Y., Xiao, P.: The impact of sampling and network topology on the estimation of social intercorrelations. J. Market. Res. 50(1), 95–110 (2013)

    Article  Google Scholar 

  • Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 209–226 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Darmofal, D.: Spatial Analysis for the Social Sciences. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  • Doreian, P.: Estimating linear models with spatially distributed data. Sociol. Methodol. 12, 359–388 (1981)

    Article  Google Scholar 

  • Doreian, P., Freeman, L., White, D., Romney, A.: Models of network effects on social actors. In: Research Methods in Social Network Analysis pp. 295–317 (1989)

  • Fujimoto, K., Chou, C.P., Valente, T.W.: The network autocorrelation model using two-mode data: affiliation exposure and potential bias in the autocorrelation parameter. Soc. Netw. 33(3), 231–243 (2011)

    Article  Google Scholar 

  • Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp. 1207–1214 (2012)

  • Haggett, P.: Hybridizing alternative models of an epidemic diffusion process. Econ. Geogr. 52(2), 136–146 (1976)

    Article  Google Scholar 

  • Lee, L.F.: Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72(6), 1899–1925 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, L.F., Liu, X.: Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econ. Theory 26(1), 187–230 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, L., Yu, J.: Estimation of spatial autoregressive panel data models with fixed effects. J. Econ. 154(2), 165–185 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Lf, L., Liu, X., Lin, X.: Specification and estimation of social interaction models with network structures. Econ. J. 13(2), 145–176 (2010)

    MathSciNet  Google Scholar 

  • Leenders, R.T.: Modeling social influence through network autocorrelation: constructing the weight matrix. Soc. Netw. 24(1), 21–47 (2002)

    Article  Google Scholar 

  • LeSage, J., Pace, R.K.: Introduction to Spatial Econometrics. Chapman and Hall, Boca Raton (2009)

    Book  MATH  Google Scholar 

  • LeSage, J.P., Pace, R.K.: Models for spatially dependent missing data. J. Real Estate Financ. Econ. 29(2), 233–254 (2004)

    Article  Google Scholar 

  • Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection (2014)

  • Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E.: Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72(3), 445–463 (2002)

    Article  Google Scholar 

  • Lin, X., Lf, L.: Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ. 157(1), 34–52 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)

  • O’Malley, A.J.: The analysis of social network data: an exciting frontier for statisticians. Stat. Med. 32(4), 539–555 (2013)

    Article  MathSciNet  Google Scholar 

  • Ord, K.: Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 70(349), 120–126 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  • OSC: Ohio Supercomputer Center. Columbus, OH: Ohio Supercompu-ter Center. http://osc.edu/ark:/19495/f5s1ph73 (1987). Accessed 21 Dec 2018

  • Pace, R.K., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33(3), 291–297 (1997)

    Article  MATH  Google Scholar 

  • Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  • Robins, G.: A tutorial on methods for the modeling and analysis of social network data. J. Math. Psychol. 57(6), 261–274 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, G., Pattison, P., Elliott, P.: Network models for social influence processes. Psychometrika 66(2), 161–189 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Shao, J.: Mathematical Statistics. Springer, New York (2003)

    Book  MATH  Google Scholar 

  • Smirnov, O., Anselin, L.: Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Stat. Data Anal. 35(3), 301–319 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Smirnov, O.A.: Computation of the information matrix for models with spatial interaction on a lattice. J. Comput. Graph. Stat. 14(4), 910–927 (2005)

    Article  MathSciNet  Google Scholar 

  • Stewart, G.: Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numer. Math. 83(2), 313–323 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Suesse, T.: Estimation of spatial autoregressive models with measurement error for large data sets. Comput. Stat. 33(4), 1627–1648 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Suesse, T.: Marginal maximum likelihood estimation of SAR models with missing data. Comput. Stat. Data Anal. 120, 98–110 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Suesse, T., Chambers, R.: Using social network information for survey estimation. J. Off. Stat. 34(1), 181–209 (2018)

    Article  Google Scholar 

  • Suesse, T., Zammit-Mangion, A.: Computational aspects of the em algorithm for spatial econometric models with missing data. J. Stat. Comput. Simul. 87(9), 1767–1786 (2017)

    Article  MathSciNet  Google Scholar 

  • Sun, D., Tsutakawa, R.K., Speckman, P.L.: Posterior distribution of hierarchical models using car (1) distributions. Biometrika 86(2), 341–350 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, S., Luo, L., Zhang, Z.: SPSD matrix approximation vis column selection: theories, algorithms, and extensions. J. Mach. Learn. Res. 17(49), 1–49 (2016)

    MathSciNet  MATH  Google Scholar 

  • Wang, W., Lee, L.F.: Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econ. J. 16(1), 73–102 (2013)

    MathSciNet  Google Scholar 

  • Whittle, P.: On stationary processes in the plane. Biometrika 41, 434–449 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  • Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10(1–2), 1–157 (2014)

  • Zhou, J., Tu, Y., Chen, Y., Wang, H.: Estimating spatial autocorrelation with sampled network data. J. Bus. Econ. Stat. 35(1), 130–138 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank the associate editor and two reviewers of Statistics and Computing for their insightful comments that greatly improved this work. Li’s work is partially supported by the Henry Laws Fellowship Award and the Taft Research Center at the University of Cincinnati. Kang’s research is partially supported by the Simons Foundation Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). We would like to thank Dr. Shan Ba, Dr. Won Chang, Dr. Noel Cressie, Dr. Alex B. Konomi, and Dr. Siva Sivaganesan for their helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emily L. Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This section contains the proofs of theorems and lemmas for the paper.

1.1 A.1 Proof of Theorem 1

To prove Theorem 1, we first need to state and prove four lemmas.

Lemma 1

Assume \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded for any n and N with \(n < N\), then \(\Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \) is also bounded.

Proof

Since \( \varOmega _{22} \) is a symmetric positive semi-definite matrix (SPSD), all its eigenvalues, \( \tau _i's, \) are nonnegative. Let \( \tau _\mathrm{min}\) be the smallest eigenvalue of \(\varOmega _{22}. \) Since \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded, there exists \( M_1 > 0,\) such that

$$\begin{aligned} \Vert \varOmega _{22}^{-1} \Vert _F= & {} \{ \text {tr} [ ( \varOmega _{22}^{-1} )^{T} \varOmega _{22}^{-1} ] \} ^ {1/2} = \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\tau _i^2} \right) ^{1/2} \\\le & {} \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\tau _\mathrm{min}^2} \right) ^{1/2} = \left( \frac{N-n}{ \tau _\mathrm{min}^2} \right) ^{1/2} \\= & {} \frac{(N-n)^{1/2}}{\tau _\mathrm{min}} = M_1. \end{aligned}$$

Since \(n<N\), \(\tau _\mathrm{min} \ne 0.\) And therefore, \(\tau _i >0,\) where \(i = 1, \dots , N-n.\) Thus, \( \varOmega _{22} \) is a positive definite matrix (SPD). From Theorem 6 of Wang et al. (2016), \( {\tilde{ \varOmega }_{22}} ^{ss} \) is also SPD; thus, all its eigenvalues, \(\sigma _i's\), are positive, where \(i=1,\dots ,N-n.\) Let \(\sigma _\mathrm{min}\) be the smallest eigenvalue of \( {\tilde{ \varOmega }_{22}} ^{ss} \). Then, there exists \( \displaystyle M_2 = \frac{(N-n)^{1/2}}{\sigma _\mathrm{min}} > 0,\) such that

$$\begin{aligned} \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F= & {} \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\sigma _i^2} \right) ^{1/2} \le \left( \displaystyle \sum _{i=1}^{N-n} \frac{1}{\sigma _\mathrm{min}^2} \right) ^{1/2} \\= & {} \left( \frac{N-n}{ \sigma _\mathrm{min}^2} \right) ^{1/2} = \frac{(N-n)^{1/2}}{\sigma _\mathrm{min}} = M_2. ~~~ \end{aligned}$$

\(\square \)

Lemma 2

For any \( \epsilon > 0,\) there exists \(\delta > 0\) such that for any \(|a-b| < \delta \), we have \(| \log a - \log b | < \epsilon \).

Proof

Taylor series of \( f(x) = \log x\) at \(x=x_0\) is \( \log x = \log x_0 + \sum \nolimits _{t=1}^{\infty } \frac{(-1)^{t+1}}{t} (1-\frac{x_0}{x})^t.\) Thus,

$$\begin{aligned} | \log a - \log b |= & {} \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t+1}}{t} \left( 1-\frac{b}{a}\right) ^t \right| \\= & {} \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t-1}}{t} \left( 1-\frac{b}{a}\right) ^{t-1} \frac{1}{a} (a-b) \right| \\= & {} |a-b| \times \left| \displaystyle \sum _{t=1}^{\infty } \frac{(-1)^{t-1}}{ta} \left( 1-\frac{b}{a}\right) ^{t-1} \right| \\< & {} \delta \times \left| \displaystyle \sum _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} \left( 1-\frac{b}{a}\right) ^{t-1} \right| . \end{aligned}$$

Based on Leibniz’s theorem for convergence of an infinite series, \( \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} u_t \), where \(u_t >0\) converges, must have the following two conditions being satisfied: 1. \(u_t \ge u_{t+1},\) for all \(t \ge N, N \in \mathcal {N} \); 2. \( {\lim \nolimits _{t \rightarrow \infty }} u_t = 0.\)

In our case, \( \displaystyle u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1}. \) Let us check those conditions one after another.

Condition 0, for any \(\displaystyle a>b>0, u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1} > 0.\)

Condition 1, for any \(\displaystyle a>b>0, - \frac{b}{a} \le \frac{1}{t}. \) Thus, we have \( \displaystyle \frac{1}{ta} (1-\frac{b}{a})^{t-1} \ge \frac{1}{(t+1)a} (1-\frac{b}{a})^t, \) which implies \(u_t \ge u_{t+1}.\)

Condition 2, for any \( a>b>0, {\lim \nolimits _{t \rightarrow \infty }} u_t = {\lim \nolimits _{t \rightarrow \infty }} \frac{1}{ta} (1-\frac{b}{a})^{t-1} = 0. \)

Since all conditions are satisfied, there exists \(S>0\) such that \( | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S. \) Let \( \displaystyle \delta = \frac{\epsilon }{S}\). Then, for any \(\epsilon > 0, | \log a - \log b | < \delta \times | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = \delta \times S = \frac{\epsilon }{S} \times S = \epsilon .\)

Note: \( \displaystyle S < | (-1)^{1-1} \frac{1}{1 \times a} (1 - \frac{b}{a})^{1-1} | = \frac{1}{a} \) since the alternating infinite series is descending in absolute value. \(\square \)

Lemma 3

Let \( A\) be an \( n \times n\) matrix. If there exists \(M > 0\) such that \( \Vert A\Vert _F^2 < M \le n \), then for any \(k = 1, \ldots , m \) , we have \( \Vert A^k \Vert _F^2 < M,\) where m is any fixed integer.

Proof

Let \( \nu _i's\), \(i = 1, \ldots , n\), denote the eigenvalues of matrix \( A\). Since \( \Vert A\Vert _F^2 = \text {tr} (A^T A) = \sum \nolimits _{i=1}^n \nu _i^2 \le n |\nu _\mathrm{max} |^2 < M, \) then we have

$$\begin{aligned} \Vert A^k \Vert _F^2= & {} \text {tr} [ (A^k)^T A^k ] = \displaystyle \sum _{i=1}^n (\nu _i^k)^2 \le n ( |\nu _\mathrm{max}|^k )^2 \\= & {} n ( |\nu _\mathrm{max}|^2 )^k \le n \Big (\frac{M}{n} \Big )^k = \Big (\frac{M}{n} \Big )^{k-1} M \le M. ~~ \end{aligned}$$

\(\square \)

Lemma 4

Let \(A_{n \times n} = \{a_{ij} \}_{n \times n},\ B_{n \times n} = \{ b_{ij} \}_{n \times n}\), and assume \( \Vert A\Vert _F^2< M = O(1) \le n, \ \Vert B\Vert _F^2 < M = O(1) \le n,\) then for any \(\epsilon _0 > 0,\) there exists \(\delta _0 > 0, \) such that for any \( A, B\) with \( \Vert A- B\Vert _F^2 < \delta _0, \) we have \( \Vert A^h - B^h \Vert _F^2 < \epsilon _0, \) where \(h=1, \ldots , m,\) and m is any fixed integer.

Proof

By using mathematical induction, we need to prove:

  • Step 1 If \( \Vert A- B\Vert _F^2 < \delta _0 \), then \( \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. \)

  • Step 2 For any \( h = 1, \dots , m-1,\) if \( \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0 \), then \( \Vert A^{h} - B^{h} \Vert _F^2 < \epsilon _0. \)

We first prove Step 1. For any \(i, j = 1, \dots , n, ~~ | a_{ij} - b_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | a_{ij} - b_{ij} | \le \Vert A- B\Vert _F < \delta _0^{1/2} \) .

\( A^2 = A\times A= \left( \sum \nolimits _{k=1}^n a_{ik} a_{kj}\right) _{n \times n} \), and \( B^2 = B\times B= \left( \sum \nolimits _{k=1}^n b_{ik} b_{kj}\right) _{n \times n} \)

$$\begin{aligned}&\Vert A^2 - B^2 \Vert _F^2\nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} a_{kj} - b_{ik} b_{kj} ) \right] ^2 \nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} a_{kj} - a_{ik} b_{kj} ) + \sum _{k=1}^n (a_{ik}b_{kj} - b_{ik} b_{kj} ) \right] ^2 \nonumber \\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (a_{k_1j} - b_{k_1j} ) a_{ik_2} ( a_{k_2j} - b_{k_2j} ) \right] \right. \nonumber \\&\qquad \left. + \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ (a_{ik_1} - b_{ik_1}) b_{k_1j} (a_{ik_2} - b_{ik_2}) b_{k_2j} \right] \right. \nonumber \\&\qquad \left. +\, 2 \displaystyle \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (a_{k_1j} - b_{k_1j}) (a_{ik_2} - b_{ik_2}) b_{k_2j} \right] \right\} \nonumber \\&\quad \le \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 + \sum _{k_1=1}^n \sum _{k_2=1}^n b_{k_1j} b_{k_2j} \delta _0 \right. \nonumber \\&\qquad \left. +\, 2 \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} b_{k_2j} \delta _0 \right\} \end{aligned}$$
(8)

Let \( A^T A= \{ t_{ij} \}_{n \times n} \) and let \(\text {s}(A^T A) := \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n t_{ij} , \) then we have the following inequality:

$$\begin{aligned}&\displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 \right] \\&\quad \le \frac{\delta _0}{2} \sum \nolimits _{i=1}^n \sum _{j=1}^n \left[ \sum _{k_1=1}^n \sum _{k_2=1}^n ( a_{ik_1}^2 + a_{ik_2}^2 ) \right] \\&\quad = \frac{\delta _0}{2} \left[ \sum _{j=1}^n \sum _{k_2=1}^n \left( \sum _{i=1}^n \sum _{k_1=1}^n a_{ik_1}^2 \right) + \sum _{j=1}^n \sum _{k_1=1}^n \left( \sum _{i=1}^n \sum _{k_2=1}^n a_{ik_2}^2 \right) \right] \\&\quad < \frac{\delta _0}{2} [ n^2 M + n^2 M ] = n^2 M \delta _0. \end{aligned}$$

Similarly, we can get \(\sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n b_{ik_1} b_{ik_2} \delta _0\right] < n^2 M \delta _0 \) , and \( 2 \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n a_{ik_1} b_{k_2 j} \delta _0 \right] < 2 n^2 M \delta _0 . \) Thus, from (8), we can derive \( \Vert A^2 - B^2 \Vert _F^2 < 4 n^2 M \delta _0. \)

Hence, for any \(\epsilon _0 > 0,\) there exists \( \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,\) such that for any \(\Vert A- B\Vert _F^2 < \delta _0, \) we have \( \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. \text { (End of proving Step 1.) } \)

We now prove Step 2.

Let \( A^{h-1} = \{ c_{ij} \}_{n \times n}, B^{h-1} = \{ d_{ij} \}_{n \times n}. \)

\( \forall i, j = 1, \dots , n, ~~ | c_{ij} - d_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | c_{ij} - d_{ij} | \le \Vert A^{h-1} - B^{h-1} \Vert _F < \delta _0^{1/2}. \)

\( A^h = A\times A^{h-1} = \left( \sum \nolimits _{k=1}^n a_{ik} c_{kj}\right) _{n \times n} \), and \( B^h = B\times B^{h-1} = \left( \sum \nolimits _{k=1}^n b_{ik} d_{kj}\right) _{n \times n} \)

Similar to Proof of Step 1, we have

$$\begin{aligned}&\Vert A^h - B^h \Vert _F^2\\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left[ \sum _{k=1}^n (a_{ik} c_{kj} - b_{ik} d_{kj} ) \right] ^2\\&\quad = \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (c_{k_1j} - d_{k_1j} ) a_{ik_2} ( c_{k_2j} - d_{k_2j} ) \right] \right. \\&\left. \qquad + \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ (a_{ik_1} - b_{ik_1}) d_{k_1j} (a_{ik_2} - b_{ik_2}) d_{k_2j} \right] \right. \\&\left. \qquad + 2 \displaystyle \sum _{k_1=1}^n \sum _{k_2=1}^n \left[ a_{ik_1} (c_{k_1j} - d_{k_1j}) (a_{ik_2} - b_{ik_2}) d_{k_2j} \right] \right\} \\&\quad \le \displaystyle \sum _{i=1}^n \sum _{j=1}^n \left\{ \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} a_{ik_2} \delta _0 + \sum _{k_1=1}^n \sum _{k_2=1}^n d_{k_1j} d_{k_2j} \delta _0 \right. \\&\left. \qquad + 2 \sum _{k_1=1}^n \sum _{k_2=1}^n a_{ik_1} b_{k_2j} \delta _0 \right\} \\&\quad < 4 n^2 M \delta _0, \end{aligned}$$

Hence, for any \(\epsilon _0 > 0,\) there exists \( \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,\) such that for any \(A, B\) with \( \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0, \) we have \( \Vert A^h - B^h \Vert _F^2 < \epsilon _0. \) (End of proving Step 2 and Lemma 4) \(\square \)

Proof of Theorem 1

Let \(\alpha > \lambda _1 ( \varSigma _{0}^{-1} )\), where \( \lambda _1 ( \varSigma _{0}^{-1} )\) is the largest eigenvalue of \( \varSigma _{0}^{-1} \). Then, the exact loglikelihood can be derived using matrix Taylor expansion.

$$\begin{aligned} \log f_{Y_\mathrm{O}} (\rho )= & {} - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} + \frac{1}{2} \log |\varSigma _{0}^{-1}| \\= & {} - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} \\&+ \frac{1}{2} \left[ n \log (\alpha ) - \displaystyle \sum _{k=1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right] \\:= & {} T_1 + T_2 , \end{aligned}$$

where \(\varSigma _0^{-1} \equiv \varOmega _{11} -\varOmega _{12} \varOmega _{22}^{-1} \varOmega _{21} \).

And the approximated loglikelihood using RMLE method can be written in the following way.

$$\begin{aligned}&\log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho )\\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1} Y_\mathrm{O} + \frac{1}{2} \log | \widetilde{\varSigma }_{0} ^{-1} | \\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1} Y_\mathrm{O} \\&\qquad + \frac{1}{2} \left\{ n \log (\alpha ) {-} \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n {-} \frac{ {\widetilde{\varSigma }}_{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right\} \\&\quad := \widetilde{T}_1 + \widetilde{T}_2 , \end{aligned}$$

where \( \widetilde{\varSigma }_0 ^{-1} \equiv \varOmega _{11} -\varOmega _{12} ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \varOmega _{21} \).

To complete the proof of Theorem 1, under some assumptions and regularity conditions, we need to prove the following two steps.

  • Step 1 As \( \displaystyle N \rightarrow \infty , n \rightarrow \infty \) and \( \displaystyle \frac{n}{N} = c\), \(|\widetilde{T}_1 - T_1 | = o(n^{-1/2})\).

  • Step 2 As \( \displaystyle N \rightarrow \infty , n \rightarrow \infty , \frac{n}{N} = c,\) and \( p,m \rightarrow \infty \), \(|\widetilde{T}_2 - T_2 | = o(n^{-1/2})\).

We first prove Step 1, \(|\widetilde{T}_1 - T_1 | = o(n^{-1/2})\). From Theorem 9 of Wang et al. (2016), we can have the following result.

$$\begin{aligned}&\Vert \varOmega _{22} - \varOmega _{22}^{ss} \Vert _F\\&\quad \le \left\{ \eta \left( \Vert \varOmega _{22} - \varOmega _{22,k} \Vert _F^2 - \frac{ \left[ \sum _{i=k+1}^{N-n} \tau _i (\varOmega _{22}) \right] ^2 }{ N-n-k } \right) \right\} ^{1/2} := \epsilon _1, \end{aligned}$$

where define \(\displaystyle \eta = \frac{ \sum _{i=1}^{k} \tau _i^2 (\varOmega _{22}) }{\sum _{i=1}^{N-n} \tau _i^2 (\varOmega _{22}) } = \frac{\Vert \varOmega _{22,k} \Vert _F^2}{\Vert \varOmega _{22} \Vert _F^2} \), where \(\tau _i \) is the eigenvalue of \( \varOmega _{22}, i=1, \dots , N-n \) and \( \varOmega _{22,k} \) is the best rank k approximation to \( \varOmega _{22}. \) Here, we assume \(\epsilon _1 = o(n^{-7/2}). \)

Under Assumption 1, \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded, then there exists \( \displaystyle M_1 = \frac{ (N-n)^{1/2}}{\tau _\mathrm{min}} > 0\) such that \(\Vert \varOmega _{22}^{-1} \Vert _F \le M_1 \). By lemma 1, there exists \( \displaystyle M_2 = \frac{ (N-n)^{1/2} }{\sigma _\mathrm{min}} > 0\) such that \( \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \le M_2,\) where \( \tau _\mathrm{min}\) and \(\sigma _\mathrm{min}\) are the smallest eigenvalues of \(\varOmega _{22}\) and \( {\tilde{ \varOmega }_{22}} ^{ss} \) , respectively.

Hence, we have

$$\begin{aligned} \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F= & {} \Vert \varOmega _{22}^{-1}( \varOmega _{22} - {{\tilde{ \varOmega }_{22}} ^{ss}}) ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \\\le & {} \Vert \varOmega _{22}^{-1} \Vert _F \Vert \varOmega _{22} {-} {{\tilde{ \varOmega }_{22}} ^{ss}} \Vert _F \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \\< & {} M_1 \epsilon _1 M_2 . \end{aligned}$$

Further, we can derive

$$\begin{aligned}&| Y_\mathrm{O} ^{T} \widetilde{\varSigma }_{0} ^{-1}Y_\mathrm{O} - Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} |\\&\quad = \Vert Y_\mathrm{O} ^{T} (\varOmega _{11} -\varOmega _{12} ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \varOmega _{21}) Y_\mathrm{O} \\&\qquad -\,Y_\mathrm{O} ^{T} (\varOmega _{11} -\varOmega _{12} \varOmega _{22}^{-1} \varOmega _{21} )Y_\mathrm{O} \Vert _F \\&\quad = \Vert Y_\mathrm{O} ^{T} \varOmega _{12} (\varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} )\varOmega _{21} Y_\mathrm{O} \Vert _F \\&\quad \le \Vert Y_\mathrm{O}^{T} \varOmega _{12} \Vert _F \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \Vert \varOmega _{21} Y_\mathrm{O} \Vert _F \\&\quad < c_1 M_1 \epsilon _1 M_2 c_1. \end{aligned}$$

Note that \(c_1\) is bounded as \( N \rightarrow \infty \) and \( n \rightarrow \infty \) based on Assumption 2.

Let \(\hbox {a} = \max \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} \) and \( b = \min \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} \). By Lemma 2, there exists \(S_1 >0\), such that \( | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S_1. \) Notice that neither \( \widetilde{ \varSigma }_{0}^{-1} \) nor \( \varSigma _{0} ^{-1} \) is sparse since \( ( {{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \) or \( { \varOmega }_{22} ^{-1} \) is not sparse. Without loss of generality, let us use \( D_{n \times n} = \{ d_{ij} \}_{n \times n} \) to denote either \( \widetilde{\varSigma }_{0}^{-1} \) or \( \varSigma _{0} ^{-1} \). Then, \( Y_\mathrm{O} ^{T} DY_\mathrm{O} = \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n d_{ij} y_i y_j = O(n^2). \)

And thus we have

$$\begin{aligned} \big | \widetilde{T}_1 - T_1 \big |= & {} \frac{n}{2} \ \Big | \log (Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}) - \log (Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O}) \Big | \\< & {} \frac{n}{2} c_1 M_1 \epsilon _1 M_2 c_1 S_1 \\= & {} \frac{n}{2} c_1 \frac{ (N-n)^{1/2} }{\tau _\mathrm{min}} \epsilon _1 \frac{ (N-n)^{1/2} }{\sigma _\mathrm{min}} c_1 S_1 \\< & {} \frac{n}{2} c_1^2 \frac{N-n}{\tau _\mathrm{min} \sigma _\mathrm{min} } \epsilon _1 \frac{1}{a} \\= & {} \frac{n}{2} c_1^2 \frac{n/c - n}{\tau _\mathrm{min} \sigma _\mathrm{min} } \epsilon _1 \frac{1}{a} \\< & {} o(n^{-1/2}). \end{aligned}$$

We now prove Step 2, \(|\widetilde{T}_2 - T_2 | = o(n^{-1/2})\). To complete the proof, we need to first introduce a transitional loglikelihood.

$$\begin{aligned}&\log _\mathrm{Tansit} f_{Y_\mathrm{O}} (\rho )\\&\quad = - \frac{n}{2} \log Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1} Y_\mathrm{O} \\&\qquad +\, \frac{1}{2} \left\{ n \log (\alpha ) {-} \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n {-} \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right\} \\&\quad := T_1 + \widehat{T}_2 , \end{aligned}$$

\( | \widetilde{T}_2 - T_2 | = | \widetilde{T}_2 - \widehat{T}_2 + \widehat{T}_2 - T_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2 | \). Next, we need to prove \( | \widehat{T}_2 - T_2 | = o(n^{-1/2}) \) and \( | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{-1/2}) \) , respectively.

We first prove \( | \widehat{T}_2 - T_2 | = o(n^{-1/2})\) as follows:

$$\begin{aligned} | \widehat{T}_2 - T_2 |= & {} | {\widehat{logdet}} (\varSigma _0 ^{-1}) - logdet (\varSigma _0 ^{-1}) | \\= & {} \frac{1}{2} \ \left| \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n - \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right. \\&\left. - \displaystyle \sum _{k=1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \\\le & {} \frac{1}{2} \left\{ \left| \displaystyle \sum _{k=1}^m \left[ \frac{1}{kp} \displaystyle \sum _{i=1}^p g_i^T \left( I_n - \frac{ \varSigma _{0} ^{-1} }{ \alpha } \right) ^k g_i \right] \right. \right. \\&\left. - \displaystyle \sum _{k=1}^m \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \\&\left. + \left| \displaystyle \sum _{k=m+1}^{\infty } \frac{1}{k} \text {tr} \left( I_n - \frac{ \varSigma _{0}^{-1} }{ \alpha } \right) ^k \right| \right\} \\= & {} \frac{1}{2} \ \left( \varGamma _1 + \varGamma _2 \right) . \\ \end{aligned}$$

From Lemma 7 in Boutsidis et al. (2015), we can get:

  1. 1.

    With \(\delta = 0.01, p = 20 \ln (2/\delta ) / \epsilon ^2\), and \( \epsilon = o(n^{-3/2})\), with probability at least 0.99, \(\varGamma _1 \le \epsilon \times \text {tr} [\sum \nolimits _{k=1}^\infty ( I_n - \frac{\varSigma _0^{-1}}{\alpha } )^k / k ] .\)

  2. 2.

    Let \( \displaystyle \kappa ( \varSigma _0^{-1} ) = \frac{\lambda _1(\varSigma _0^{-1})}{ \lambda _n (\varSigma _0^{-1}) } \ge 1 \) and \( \epsilon = o(n^{-3/2}),\) where \( \lambda _i(\varSigma _0^{-1}) \) denotes the ith largest eigenvalue of \(\varSigma _0^{-1}\). From Boutsidis et al. (2015), we set

    $$\begin{aligned} m= & {} O \bigg [ \log \bigg ( \frac{ log (\kappa (\varSigma _0^{-1}) ) }{2 \epsilon \log (5 \ \kappa ( \varSigma _0^{-1} ) ) } \bigg ) \times \kappa (\varSigma _0^{-1}) \bigg ] \\= & {} O \Big [ \log \left( \frac{1}{2 \epsilon } \right) \Big ] = O \bigg [ \log \bigg ( \frac{1}{2 \times o(n^{- 3/2})} \bigg ) \bigg ] \\= & {} O \big ( \log \big [ O (n^{3/2}) \big ] \big ) = O \big ( \ O \big ( n^{ 3 \bar{\epsilon } /2 } \big ) \ \big ) \le O \big ( n^{3/2 } \big ), \end{aligned}$$

    where \( \bar{\epsilon } \) is fixed and \( 0< \bar{\epsilon } < 1. \) Then, we can get

    $$\begin{aligned} \varGamma _2\le & {} \left[ 1 - \frac{\lambda _n ( \varSigma _0 ^{-1})}{\alpha } \right] ^ m \times \displaystyle \sum _{k=1}^\infty \frac{1}{k} \ \text {tr} \left[ \left( I_n - \frac{\varSigma _0^{-1} }{ \alpha } \right) ^k \right] \\\le & {} \epsilon \times \displaystyle \sum _{i=1}^n \log \bigg ( \frac{\alpha }{\lambda _i (\varSigma _0^{-1} )} \bigg ) = o \big (n^{- 3/2 } \big ) \times O (n) \\= & {} o(n^{- 1/2 }). \end{aligned}$$

Hence, \( | \widehat{T}_2 - T_2 | = o(n^{- 1/2 })\).

We then prove \( | \widetilde{T}_2 - \widehat{T}_2 | \longrightarrow 0 . \)

$$\begin{aligned}&| \widetilde{T}_2 - \widehat{T}_2 | = | \ {\widehat{logdet}} ( \widetilde{ \varSigma }_0^{-1} ) - {\widehat{logdet}} ( \varSigma _0^{-1} ) \ | \\&\quad = \frac{1}{2} \displaystyle \sum _{k=1}^m \frac{1}{k} \displaystyle \sum _{i=1}^p \frac{1}{p} g_i^T \left[ \left( I_n {-} \frac{\varSigma _0^{-1}}{\alpha } \right) ^k - \left( I_n {-} \frac{\widetilde{\varSigma }_0^{-1}}{\alpha } \right) ^k \right] g_i. \end{aligned}$$

Let \( \displaystyle A= I_n - \frac{\varSigma _0^{-1}}{\alpha } \), \( \displaystyle B= I_n - \frac{ \widetilde{\varSigma }_0^{-1}}{\alpha } \), and \( A^k - B^k = Q= \{q_{k_1 k_2} \}_{n \times n}. \)

Then, \( | \widetilde{T}_2 - \widehat{T}_2 | \) can be written as

$$\begin{aligned} | \widetilde{T}_2 - \widehat{T}_2 | = \displaystyle \frac{1}{2} \sum _{k=1}^m \frac{1}{k} \bigg ( \sum _{i=1}^p \frac{1}{p} g_i^T Qg_i \bigg ) . \end{aligned}$$

Then, we have

$$\begin{aligned}&\Vert A- B\Vert _F ^2\\&\quad = \displaystyle \frac{1}{\alpha ^2} \Vert \varSigma _0^{-1} - \widetilde{ \varSigma }_0^{-1} \Vert _F^2\\&\quad = \frac{1}{\alpha ^2} \Vert \ \varOmega _{12} \big (\varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \big ) \varOmega _{21} \ \Vert _F^2 \\&\quad \le \displaystyle \frac{1}{\alpha ^2} \Vert \varOmega _{12} \Vert _F^2 \Vert \varOmega _{22}^{-1} - ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F^2 \Vert \varOmega _{21} \Vert _F^2 \le \frac{1}{\alpha ^2} d_1 \epsilon _1^2 d_1. \end{aligned}$$

By Lemma 4, we have that for any \( \displaystyle k = 1, \ldots , m, \Vert A^k - B^k \Vert _F^2 < \frac{1}{\alpha ^2} d_1^2 \epsilon _1^2 4 n^2 M = o(n^{-5}), \) where m is a fixed integer. Then \( \Vert Q\Vert _F \ge \Vert Q\Vert _2 \ge \frac{1}{n^{1/2}} \Vert Q\Vert _1 = \frac{1}{ n^{1/2} } \ \underset{1 \le j \le n}{max} \sum \nolimits _{i=1}^n |q_{ij} | \ge \frac{1}{ n^{1/2} } \sum _{i=1}^n \sum _{j=1}^n | q_{ij} |. \)

Thus, for any \(g_i = ( g_{i1}, \dots , g_{in}) \in \text {Multivariate Normal}(0, I_n),\) we have

$$\begin{aligned} g_i^T Qg_i = \displaystyle \sum _{k_1 = 1}^n \sum _{k_2 = 1}^n g_{ik_1} g_{ik_2} q_{k_1 k_2} \le n^{1/2} \Vert Q\Vert _F = o(n^{-2}). \end{aligned}$$

Further, for any \(p \ge 1, \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i = o(n^{-2})\). Then, for any \( m = O(n^{3/2}), \ \sum \nolimits _{k=1}^m \frac{1}{k} \ ( \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i ) = o(n^{- 1/2 })\). Hence, \( | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{- 1/2 }). \)

Therefore, \( | \widetilde{T}_2 - {T}_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2| = o(n^{- 1/2 }). \text { (End of proving Step 2.)} \)

Lastly, we have

$$\begin{aligned}&| \ \log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho ) - \log f_{Y_\mathrm{O}} (\rho ) \ | \\&\quad = | \ (\widetilde{T}_1 - T_1 ) + (\widetilde{T}_2 - T_2 ) \ | \\&\quad \le | \ \widetilde{T}_1 - T_1 \ | + | \ \widetilde{T}_2 - \widehat{T}_2 \ | + | \ \widehat{T}_2 - T_2 \ | \\&\quad = o(n^{- 1/2 }). ~~~ {\text {(End of proving Theorem}}~1. {\text {)}} \end{aligned}$$

\(\square \)

1.2 A.2 Proof of Theorem 2

Proof

For simplicity, we denote \( \log f_{Y_\mathrm{O}} (\rho )\) as \(l (\rho )\) , the exact loglikelihood, and \(\log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho ) \) as \( \tilde{l} (\rho )\), the approximated loglikelihood using RMLE method, whose first derivatives can be written as \( l^{'} (\rho ) \) and \( \tilde{l} ^{'} (\rho )\) and second derivatives can be written as \( l^{''} (\rho ) \) and \( \tilde{l} ^{''} (\rho )\), respectively.

Then, from Theorem 1 and Assumption 6, we are able to derive the following:

$$\begin{aligned}&n^{1/2} \ [ l^{'} (\rho ) - \tilde{l}^{'} (\rho ) ] = n^{1/2} \ \frac{d [l(\rho ) - \tilde{l} (\rho ) ]}{ d \rho } = o(1),\\&n^{1/2} \ [ l^{''} (\rho ) - \tilde{l}^{''} (\rho ) ] = n^{1/2} \ \frac{d^2 [l (\rho ) - \tilde{ l } (\rho ) ]}{ d \rho ^2} = o(1). \end{aligned}$$

By conducting Taylor expansions of \( l^{'} (\rho ) \) and \( \tilde{l} ^{'} (\rho )\) at point \(\rho = \rho _1\), we can get:

$$\begin{aligned}&l^{'} (\hat{\rho }_\mathrm{MLE}) = l^{'} (\rho _1) + l^{''} (\rho _1) (\hat{\rho }_\mathrm{MLE} - \rho _1 ) + O(1) \overset{\mathrm{set}}{=} 0,\\&\tilde{l}^{'} (\hat{\rho }_\mathrm{RMLE}) = \tilde{l}^{'} (\rho _1) + \tilde{ l}^{''} (\rho _1) (\hat{\rho }_\mathrm{RMLE} - \rho _1 ) + O(1) \overset{\mathrm{set}}{=} 0. \end{aligned}$$

By solving the equations, we can obtain:

$$\begin{aligned}&\hat{\rho }_\mathrm{MLE} = \rho _1 - \frac{ l^{'} (\rho _1) }{ l^{''} (\rho _1) } + O \left( \frac{1}{n} \right) ,\\&\hat{\rho }_\mathrm{RMLE} = \rho _1 - \frac{ \tilde{l}^{'} (\rho _1) }{ \tilde{ l}^{''} (\rho _1) } + O \left( \frac{1}{n} \right) . \end{aligned}$$

Hence, we can obtain the following

$$\begin{aligned}&| \hat{\rho }_\mathrm{MLE} - \hat{\rho }_\mathrm{RMLE} |\\&\quad \le \Big | \frac{l^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{ \tilde{l}^{'} (\rho _1)}{ \tilde{ l^{''} } (\rho _1) } \Big | + O \left( \frac{1}{n} \right) \\&\quad \le \Big | \frac{l^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{\tilde{ l}^{'} (\rho _1)}{ l^{''} (\rho _1) } \Big | + \Big | \frac{\tilde{ l}^{'} (\rho _1)}{ l^{''} (\rho _1) } - \frac{ \tilde{l}^{'} (\rho _1)}{ \tilde{ l}^{''} (\rho _1) } \Big | + O \left( \frac{1}{n} \right) \\&\quad = \Big | \frac{ l^{'}(\rho _1) - \tilde{l}^{'} (\rho _1) }{ l^{''} (\rho _1)} \Big | + | \tilde{l}^{'} (\rho _1) | \times \Big | \frac{ \tilde{l}^{''} (\rho _1) - l^{''} (\rho _1 )}{ l^{''} (\rho _1) \ \tilde{l}^{''} (\rho _1) } \Big | \\&\qquad + O \left( \frac{1}{n} \right) \\&\quad = o(n^{- 1/2 }). \end{aligned}$$

On the other hand, from asymptotic theory of MLE, we have

$$\begin{aligned} n^{1/2} ( \hat{\rho }_\mathrm{MLE} - \rho _0 ) \overset{d}{\longrightarrow } N ( 0, \frac{1}{I(\rho _0)} ) ~~ \text {converge in distribution.} \end{aligned}$$

Hence,

$$\begin{aligned}&n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \rho _0 )\\&\quad = n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \hat{\rho }_\mathrm{MLE} + \hat{\rho }_\mathrm{MLE} - \rho _0 ) \\&\quad = n^{1/2} ( \hat{\rho }_\mathrm{RMLE} - \hat{\rho }_\mathrm{MLE} ) + n^{1/2} ( \hat{\rho }_\mathrm{MLE} - \rho _0 ) \\&\quad \overset{d}{\longrightarrow } N \Big ( 0, \frac{1}{I(\rho _0)} \Big ) ~~ \text {converge in distribution.} \end{aligned}$$

(End of proving Theorem 2.) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Kang, E.L. Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks. Stat Comput 29, 1165–1179 (2019). https://doi.org/10.1007/s11222-019-09862-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09862-4

Keywords

Navigation