Abstract
The spatial autoregressive (SAR) model is a classical model in spatial econometrics and has become an important tool in network analysis. However, with large-scale networks, existing methods of likelihood-based inference for the SAR model become computationally infeasible. We here investigate maximum likelihood estimation for the SAR model with partially observed responses from large-scale networks. By taking advantage of recent developments in randomized numerical linear algebra, we derive efficient algorithms to estimate the spatial autocorrelation parameter in the SAR model. Compelling experimental results from extensive simulation and real data examples demonstrate empirically that the estimator obtained by our method, called the randomized maximum likelihood estimator, outperforms the state of the art by giving smaller bias and standard error, especially for large-scale problems with moderate spatial autocorrelation. The theoretical properties of the estimator are explored, and consistency results are established.
Similar content being viewed by others
References
Anselin, L., Bera, A.K.: Spatial dependence in linear regression models with an introduction to spatial econometrics. Stat. Textb. Monogr. 155, 237–290 (1998)
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(4), 825–848 (2008)
Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data. CRC Press, Boca Raton (2014)
Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1–3), 41–54 (1999)
Beck, N., Gleditsch, K.S., Beardsley, K.: Space is more than geography: Using spatial econometrics in the study of political economy. Int. Stud. Q. 50(1), 27–44 (2006)
Boutsidis, C., Drineas, P., Kambadur, P., Kontopoulou, E.M., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv preprint arXiv:1503.00374 (2015)
Browne, K.: Snowball sampling: using social networks to research non-heterosexual women. Int. J. Soc. Res. Methodol 8(1), 47–60 (2005)
Burden, S., Cressie, N., Steel, D.G.: The SAR model for very large datasets: a reduced rank approach. Econometrics 3(2), 317–338 (2015)
Chen, X., Chen, Y., Xiao, P.: The impact of sampling and network topology on the estimation of social intercorrelations. J. Market. Res. 50(1), 95–110 (2013)
Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 209–226 (2008)
Darmofal, D.: Spatial Analysis for the Social Sciences. Cambridge University Press, Cambridge (2015)
Doreian, P.: Estimating linear models with spatially distributed data. Sociol. Methodol. 12, 359–388 (1981)
Doreian, P., Freeman, L., White, D., Romney, A.: Models of network effects on social actors. In: Research Methods in Social Network Analysis pp. 295–317 (1989)
Fujimoto, K., Chou, C.P., Valente, T.W.: The network autocorrelation model using two-mode data: affiliation exposure and potential bias in the autocorrelation parameter. Soc. Netw. 33(3), 231–243 (2011)
Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp. 1207–1214 (2012)
Haggett, P.: Hybridizing alternative models of an epidemic diffusion process. Econ. Geogr. 52(2), 136–146 (1976)
Lee, L.F.: Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72(6), 1899–1925 (2004)
Lee, L.F., Liu, X.: Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econ. Theory 26(1), 187–230 (2010)
Lee, L., Yu, J.: Estimation of spatial autoregressive panel data models with fixed effects. J. Econ. 154(2), 165–185 (2010)
Lf, L., Liu, X., Lin, X.: Specification and estimation of social interaction models with network structures. Econ. J. 13(2), 145–176 (2010)
Leenders, R.T.: Modeling social influence through network autocorrelation: constructing the weight matrix. Soc. Netw. 24(1), 21–47 (2002)
LeSage, J., Pace, R.K.: Introduction to Spatial Econometrics. Chapman and Hall, Boca Raton (2009)
LeSage, J.P., Pace, R.K.: Models for spatially dependent missing data. J. Real Estate Financ. Econ. 29(2), 233–254 (2004)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection (2014)
Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E.: Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72(3), 445–463 (2002)
Lin, X., Lf, L.: Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ. 157(1), 34–52 (2010)
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
O’Malley, A.J.: The analysis of social network data: an exciting frontier for statisticians. Stat. Med. 32(4), 539–555 (2013)
Ord, K.: Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 70(349), 120–126 (1975)
OSC: Ohio Supercomputer Center. Columbus, OH: Ohio Supercompu-ter Center. http://osc.edu/ark:/19495/f5s1ph73 (1987). Accessed 21 Dec 2018
Pace, R.K., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33(3), 291–297 (1997)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)
Robins, G.: A tutorial on methods for the modeling and analysis of social network data. J. Math. Psychol. 57(6), 261–274 (2013)
Robins, G., Pattison, P., Elliott, P.: Network models for social influence processes. Psychometrika 66(2), 161–189 (2001)
Shao, J.: Mathematical Statistics. Springer, New York (2003)
Smirnov, O., Anselin, L.: Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Stat. Data Anal. 35(3), 301–319 (2001)
Smirnov, O.A.: Computation of the information matrix for models with spatial interaction on a lattice. J. Comput. Graph. Stat. 14(4), 910–927 (2005)
Stewart, G.: Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numer. Math. 83(2), 313–323 (1999)
Suesse, T.: Estimation of spatial autoregressive models with measurement error for large data sets. Comput. Stat. 33(4), 1627–1648 (2018)
Suesse, T.: Marginal maximum likelihood estimation of SAR models with missing data. Comput. Stat. Data Anal. 120, 98–110 (2018)
Suesse, T., Chambers, R.: Using social network information for survey estimation. J. Off. Stat. 34(1), 181–209 (2018)
Suesse, T., Zammit-Mangion, A.: Computational aspects of the em algorithm for spatial econometric models with missing data. J. Stat. Comput. Simul. 87(9), 1767–1786 (2017)
Sun, D., Tsutakawa, R.K., Speckman, P.L.: Posterior distribution of hierarchical models using car (1) distributions. Biometrika 86(2), 341–350 (1999)
Wang, S., Luo, L., Zhang, Z.: SPSD matrix approximation vis column selection: theories, algorithms, and extensions. J. Mach. Learn. Res. 17(49), 1–49 (2016)
Wang, W., Lee, L.F.: Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econ. J. 16(1), 73–102 (2013)
Whittle, P.: On stationary processes in the plane. Biometrika 41, 434–449 (1954)
Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10(1–2), 1–157 (2014)
Zhou, J., Tu, Y., Chen, Y., Wang, H.: Estimating spatial autocorrelation with sampled network data. J. Bus. Econ. Stat. 35(1), 130–138 (2017)
Acknowledgements
We would like to thank the associate editor and two reviewers of Statistics and Computing for their insightful comments that greatly improved this work. Li’s work is partially supported by the Henry Laws Fellowship Award and the Taft Research Center at the University of Cincinnati. Kang’s research is partially supported by the Simons Foundation Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). We would like to thank Dr. Shan Ba, Dr. Won Chang, Dr. Noel Cressie, Dr. Alex B. Konomi, and Dr. Siva Sivaganesan for their helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This section contains the proofs of theorems and lemmas for the paper.
1.1 A.1 Proof of Theorem 1
To prove Theorem 1, we first need to state and prove four lemmas.
Lemma 1
Assume \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded for any n and N with \(n < N\), then \(\Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \) is also bounded.
Proof
Since \( \varOmega _{22} \) is a symmetric positive semi-definite matrix (SPSD), all its eigenvalues, \( \tau _i's, \) are nonnegative. Let \( \tau _\mathrm{min}\) be the smallest eigenvalue of \(\varOmega _{22}. \) Since \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded, there exists \( M_1 > 0,\) such that
Since \(n<N\), \(\tau _\mathrm{min} \ne 0.\) And therefore, \(\tau _i >0,\) where \(i = 1, \dots , N-n.\) Thus, \( \varOmega _{22} \) is a positive definite matrix (SPD). From Theorem 6 of Wang et al. (2016), \( {\tilde{ \varOmega }_{22}} ^{ss} \) is also SPD; thus, all its eigenvalues, \(\sigma _i's\), are positive, where \(i=1,\dots ,N-n.\) Let \(\sigma _\mathrm{min}\) be the smallest eigenvalue of \( {\tilde{ \varOmega }_{22}} ^{ss} \). Then, there exists \( \displaystyle M_2 = \frac{(N-n)^{1/2}}{\sigma _\mathrm{min}} > 0,\) such that
\(\square \)
Lemma 2
For any \( \epsilon > 0,\) there exists \(\delta > 0\) such that for any \(|a-b| < \delta \), we have \(| \log a - \log b | < \epsilon \).
Proof
Taylor series of \( f(x) = \log x\) at \(x=x_0\) is \( \log x = \log x_0 + \sum \nolimits _{t=1}^{\infty } \frac{(-1)^{t+1}}{t} (1-\frac{x_0}{x})^t.\) Thus,
Based on Leibniz’s theorem for convergence of an infinite series, \( \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} u_t \), where \(u_t >0\) converges, must have the following two conditions being satisfied: 1. \(u_t \ge u_{t+1},\) for all \(t \ge N, N \in \mathcal {N} \); 2. \( {\lim \nolimits _{t \rightarrow \infty }} u_t = 0.\)
In our case, \( \displaystyle u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1}. \) Let us check those conditions one after another.
Condition 0, for any \(\displaystyle a>b>0, u_t = \frac{1}{ta} (1-\frac{b}{a})^{t-1} > 0.\)
Condition 1, for any \(\displaystyle a>b>0, - \frac{b}{a} \le \frac{1}{t}. \) Thus, we have \( \displaystyle \frac{1}{ta} (1-\frac{b}{a})^{t-1} \ge \frac{1}{(t+1)a} (1-\frac{b}{a})^t, \) which implies \(u_t \ge u_{t+1}.\)
Condition 2, for any \( a>b>0, {\lim \nolimits _{t \rightarrow \infty }} u_t = {\lim \nolimits _{t \rightarrow \infty }} \frac{1}{ta} (1-\frac{b}{a})^{t-1} = 0. \)
Since all conditions are satisfied, there exists \(S>0\) such that \( | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S. \) Let \( \displaystyle \delta = \frac{\epsilon }{S}\). Then, for any \(\epsilon > 0, | \log a - \log b | < \delta \times | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = \delta \times S = \frac{\epsilon }{S} \times S = \epsilon .\)
Note: \( \displaystyle S < | (-1)^{1-1} \frac{1}{1 \times a} (1 - \frac{b}{a})^{1-1} | = \frac{1}{a} \) since the alternating infinite series is descending in absolute value. \(\square \)
Lemma 3
Let \( A\) be an \( n \times n\) matrix. If there exists \(M > 0\) such that \( \Vert A\Vert _F^2 < M \le n \), then for any \(k = 1, \ldots , m \) , we have \( \Vert A^k \Vert _F^2 < M,\) where m is any fixed integer.
Proof
Let \( \nu _i's\), \(i = 1, \ldots , n\), denote the eigenvalues of matrix \( A\). Since \( \Vert A\Vert _F^2 = \text {tr} (A^T A) = \sum \nolimits _{i=1}^n \nu _i^2 \le n |\nu _\mathrm{max} |^2 < M, \) then we have
\(\square \)
Lemma 4
Let \(A_{n \times n} = \{a_{ij} \}_{n \times n},\ B_{n \times n} = \{ b_{ij} \}_{n \times n}\), and assume \( \Vert A\Vert _F^2< M = O(1) \le n, \ \Vert B\Vert _F^2 < M = O(1) \le n,\) then for any \(\epsilon _0 > 0,\) there exists \(\delta _0 > 0, \) such that for any \( A, B\) with \( \Vert A- B\Vert _F^2 < \delta _0, \) we have \( \Vert A^h - B^h \Vert _F^2 < \epsilon _0, \) where \(h=1, \ldots , m,\) and m is any fixed integer.
Proof
By using mathematical induction, we need to prove:
-
Step 1 If \( \Vert A- B\Vert _F^2 < \delta _0 \), then \( \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. \)
-
Step 2 For any \( h = 1, \dots , m-1,\) if \( \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0 \), then \( \Vert A^{h} - B^{h} \Vert _F^2 < \epsilon _0. \)
We first prove Step 1. For any \(i, j = 1, \dots , n, ~~ | a_{ij} - b_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | a_{ij} - b_{ij} | \le \Vert A- B\Vert _F < \delta _0^{1/2} \) .
\( A^2 = A\times A= \left( \sum \nolimits _{k=1}^n a_{ik} a_{kj}\right) _{n \times n} \), and \( B^2 = B\times B= \left( \sum \nolimits _{k=1}^n b_{ik} b_{kj}\right) _{n \times n} \)
Let \( A^T A= \{ t_{ij} \}_{n \times n} \) and let \(\text {s}(A^T A) := \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n t_{ij} , \) then we have the following inequality:
Similarly, we can get \(\sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n b_{ik_1} b_{ik_2} \delta _0\right] < n^2 M \delta _0 \) , and \( 2 \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n \left[ \sum \nolimits _{k_1=1}^n \sum \nolimits _{k_2=1}^n a_{ik_1} b_{k_2 j} \delta _0 \right] < 2 n^2 M \delta _0 . \) Thus, from (8), we can derive \( \Vert A^2 - B^2 \Vert _F^2 < 4 n^2 M \delta _0. \)
Hence, for any \(\epsilon _0 > 0,\) there exists \( \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,\) such that for any \(\Vert A- B\Vert _F^2 < \delta _0, \) we have \( \Vert A^2 - B^2 \Vert _F^2 < \epsilon _0. \text { (End of proving Step 1.) } \)
We now prove Step 2.
Let \( A^{h-1} = \{ c_{ij} \}_{n \times n}, B^{h-1} = \{ d_{ij} \}_{n \times n}. \)
\( \forall i, j = 1, \dots , n, ~~ | c_{ij} - d_{ij} | \le \max \nolimits _{i,j = 1, \dots , n} | c_{ij} - d_{ij} | \le \Vert A^{h-1} - B^{h-1} \Vert _F < \delta _0^{1/2}. \)
\( A^h = A\times A^{h-1} = \left( \sum \nolimits _{k=1}^n a_{ik} c_{kj}\right) _{n \times n} \), and \( B^h = B\times B^{h-1} = \left( \sum \nolimits _{k=1}^n b_{ik} d_{kj}\right) _{n \times n} \)
Similar to Proof of Step 1, we have
Hence, for any \(\epsilon _0 > 0,\) there exists \( \displaystyle \delta _0 = \frac{\epsilon _0}{4 n^2 M} > 0,\) such that for any \(A, B\) with \( \Vert A^{h-1} - B^{h-1} \Vert _F^2 < \delta _0, \) we have \( \Vert A^h - B^h \Vert _F^2 < \epsilon _0. \) (End of proving Step 2 and Lemma 4) \(\square \)
Proof of Theorem 1
Let \(\alpha > \lambda _1 ( \varSigma _{0}^{-1} )\), where \( \lambda _1 ( \varSigma _{0}^{-1} )\) is the largest eigenvalue of \( \varSigma _{0}^{-1} \). Then, the exact loglikelihood can be derived using matrix Taylor expansion.
where \(\varSigma _0^{-1} \equiv \varOmega _{11} -\varOmega _{12} \varOmega _{22}^{-1} \varOmega _{21} \).
And the approximated loglikelihood using RMLE method can be written in the following way.
where \( \widetilde{\varSigma }_0 ^{-1} \equiv \varOmega _{11} -\varOmega _{12} ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \varOmega _{21} \).
To complete the proof of Theorem 1, under some assumptions and regularity conditions, we need to prove the following two steps.
-
Step 1 As \( \displaystyle N \rightarrow \infty , n \rightarrow \infty \) and \( \displaystyle \frac{n}{N} = c\), \(|\widetilde{T}_1 - T_1 | = o(n^{-1/2})\).
-
Step 2 As \( \displaystyle N \rightarrow \infty , n \rightarrow \infty , \frac{n}{N} = c,\) and \( p,m \rightarrow \infty \), \(|\widetilde{T}_2 - T_2 | = o(n^{-1/2})\).
We first prove Step 1, \(|\widetilde{T}_1 - T_1 | = o(n^{-1/2})\). From Theorem 9 of Wang et al. (2016), we can have the following result.
where define \(\displaystyle \eta = \frac{ \sum _{i=1}^{k} \tau _i^2 (\varOmega _{22}) }{\sum _{i=1}^{N-n} \tau _i^2 (\varOmega _{22}) } = \frac{\Vert \varOmega _{22,k} \Vert _F^2}{\Vert \varOmega _{22} \Vert _F^2} \), where \(\tau _i \) is the eigenvalue of \( \varOmega _{22}, i=1, \dots , N-n \) and \( \varOmega _{22,k} \) is the best rank k approximation to \( \varOmega _{22}. \) Here, we assume \(\epsilon _1 = o(n^{-7/2}). \)
Under Assumption 1, \(\Vert \varOmega _{22}^{-1} \Vert _F \) is bounded, then there exists \( \displaystyle M_1 = \frac{ (N-n)^{1/2}}{\tau _\mathrm{min}} > 0\) such that \(\Vert \varOmega _{22}^{-1} \Vert _F \le M_1 \). By lemma 1, there exists \( \displaystyle M_2 = \frac{ (N-n)^{1/2} }{\sigma _\mathrm{min}} > 0\) such that \( \Vert ({{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \Vert _F \le M_2,\) where \( \tau _\mathrm{min}\) and \(\sigma _\mathrm{min}\) are the smallest eigenvalues of \(\varOmega _{22}\) and \( {\tilde{ \varOmega }_{22}} ^{ss} \) , respectively.
Hence, we have
Further, we can derive
Note that \(c_1\) is bounded as \( N \rightarrow \infty \) and \( n \rightarrow \infty \) based on Assumption 2.
Let \(\hbox {a} = \max \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} \) and \( b = \min \{Y_\mathrm{O} ^{T} \widetilde{ \varSigma }_{0} ^{-1}Y_\mathrm{O}, Y_\mathrm{O} ^{T} \varSigma _{0} ^{-1}Y_\mathrm{O} \} \). By Lemma 2, there exists \(S_1 >0\), such that \( | \sum \nolimits _{t=1}^{\infty } (-1)^{t-1} \frac{1}{ta} (1-\frac{b}{a})^{t-1}| = S_1. \) Notice that neither \( \widetilde{ \varSigma }_{0}^{-1} \) nor \( \varSigma _{0} ^{-1} \) is sparse since \( ( {{\tilde{ \varOmega }_{22}} ^{ss}})^{-1} \) or \( { \varOmega }_{22} ^{-1} \) is not sparse. Without loss of generality, let us use \( D_{n \times n} = \{ d_{ij} \}_{n \times n} \) to denote either \( \widetilde{\varSigma }_{0}^{-1} \) or \( \varSigma _{0} ^{-1} \). Then, \( Y_\mathrm{O} ^{T} DY_\mathrm{O} = \sum \nolimits _{i=1}^n \sum \nolimits _{j=1}^n d_{ij} y_i y_j = O(n^2). \)
And thus we have
We now prove Step 2, \(|\widetilde{T}_2 - T_2 | = o(n^{-1/2})\). To complete the proof, we need to first introduce a transitional loglikelihood.
\( | \widetilde{T}_2 - T_2 | = | \widetilde{T}_2 - \widehat{T}_2 + \widehat{T}_2 - T_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2 | \). Next, we need to prove \( | \widehat{T}_2 - T_2 | = o(n^{-1/2}) \) and \( | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{-1/2}) \) , respectively.
We first prove \( | \widehat{T}_2 - T_2 | = o(n^{-1/2})\) as follows:
From Lemma 7 in Boutsidis et al. (2015), we can get:
-
1.
With \(\delta = 0.01, p = 20 \ln (2/\delta ) / \epsilon ^2\), and \( \epsilon = o(n^{-3/2})\), with probability at least 0.99, \(\varGamma _1 \le \epsilon \times \text {tr} [\sum \nolimits _{k=1}^\infty ( I_n - \frac{\varSigma _0^{-1}}{\alpha } )^k / k ] .\)
-
2.
Let \( \displaystyle \kappa ( \varSigma _0^{-1} ) = \frac{\lambda _1(\varSigma _0^{-1})}{ \lambda _n (\varSigma _0^{-1}) } \ge 1 \) and \( \epsilon = o(n^{-3/2}),\) where \( \lambda _i(\varSigma _0^{-1}) \) denotes the ith largest eigenvalue of \(\varSigma _0^{-1}\). From Boutsidis et al. (2015), we set
$$\begin{aligned} m= & {} O \bigg [ \log \bigg ( \frac{ log (\kappa (\varSigma _0^{-1}) ) }{2 \epsilon \log (5 \ \kappa ( \varSigma _0^{-1} ) ) } \bigg ) \times \kappa (\varSigma _0^{-1}) \bigg ] \\= & {} O \Big [ \log \left( \frac{1}{2 \epsilon } \right) \Big ] = O \bigg [ \log \bigg ( \frac{1}{2 \times o(n^{- 3/2})} \bigg ) \bigg ] \\= & {} O \big ( \log \big [ O (n^{3/2}) \big ] \big ) = O \big ( \ O \big ( n^{ 3 \bar{\epsilon } /2 } \big ) \ \big ) \le O \big ( n^{3/2 } \big ), \end{aligned}$$where \( \bar{\epsilon } \) is fixed and \( 0< \bar{\epsilon } < 1. \) Then, we can get
$$\begin{aligned} \varGamma _2\le & {} \left[ 1 - \frac{\lambda _n ( \varSigma _0 ^{-1})}{\alpha } \right] ^ m \times \displaystyle \sum _{k=1}^\infty \frac{1}{k} \ \text {tr} \left[ \left( I_n - \frac{\varSigma _0^{-1} }{ \alpha } \right) ^k \right] \\\le & {} \epsilon \times \displaystyle \sum _{i=1}^n \log \bigg ( \frac{\alpha }{\lambda _i (\varSigma _0^{-1} )} \bigg ) = o \big (n^{- 3/2 } \big ) \times O (n) \\= & {} o(n^{- 1/2 }). \end{aligned}$$
Hence, \( | \widehat{T}_2 - T_2 | = o(n^{- 1/2 })\).
We then prove \( | \widetilde{T}_2 - \widehat{T}_2 | \longrightarrow 0 . \)
Let \( \displaystyle A= I_n - \frac{\varSigma _0^{-1}}{\alpha } \), \( \displaystyle B= I_n - \frac{ \widetilde{\varSigma }_0^{-1}}{\alpha } \), and \( A^k - B^k = Q= \{q_{k_1 k_2} \}_{n \times n}. \)
Then, \( | \widetilde{T}_2 - \widehat{T}_2 | \) can be written as
Then, we have
By Lemma 4, we have that for any \( \displaystyle k = 1, \ldots , m, \Vert A^k - B^k \Vert _F^2 < \frac{1}{\alpha ^2} d_1^2 \epsilon _1^2 4 n^2 M = o(n^{-5}), \) where m is a fixed integer. Then \( \Vert Q\Vert _F \ge \Vert Q\Vert _2 \ge \frac{1}{n^{1/2}} \Vert Q\Vert _1 = \frac{1}{ n^{1/2} } \ \underset{1 \le j \le n}{max} \sum \nolimits _{i=1}^n |q_{ij} | \ge \frac{1}{ n^{1/2} } \sum _{i=1}^n \sum _{j=1}^n | q_{ij} |. \)
Thus, for any \(g_i = ( g_{i1}, \dots , g_{in}) \in \text {Multivariate Normal}(0, I_n),\) we have
Further, for any \(p \ge 1, \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i = o(n^{-2})\). Then, for any \( m = O(n^{3/2}), \ \sum \nolimits _{k=1}^m \frac{1}{k} \ ( \sum \nolimits _{i=1}^p \frac{1}{p} \ g_i^T Qg_i ) = o(n^{- 1/2 })\). Hence, \( | \widetilde{T}_2 - \widehat{T}_2 | = o(n^{- 1/2 }). \)
Therefore, \( | \widetilde{T}_2 - {T}_2 | \le | \widetilde{T}_2 - \widehat{T}_2 | + | \widehat{T}_2 - T_2| = o(n^{- 1/2 }). \text { (End of proving Step 2.)} \)
Lastly, we have
\(\square \)
1.2 A.2 Proof of Theorem 2
Proof
For simplicity, we denote \( \log f_{Y_\mathrm{O}} (\rho )\) as \(l (\rho )\) , the exact loglikelihood, and \(\log _\mathrm{RMLE} f_{Y_\mathrm{O}} (\rho ) \) as \( \tilde{l} (\rho )\), the approximated loglikelihood using RMLE method, whose first derivatives can be written as \( l^{'} (\rho ) \) and \( \tilde{l} ^{'} (\rho )\) and second derivatives can be written as \( l^{''} (\rho ) \) and \( \tilde{l} ^{''} (\rho )\), respectively.
Then, from Theorem 1 and Assumption 6, we are able to derive the following:
By conducting Taylor expansions of \( l^{'} (\rho ) \) and \( \tilde{l} ^{'} (\rho )\) at point \(\rho = \rho _1\), we can get:
By solving the equations, we can obtain:
Hence, we can obtain the following
On the other hand, from asymptotic theory of MLE, we have
Hence,
(End of proving Theorem 2.) \(\square \)
Rights and permissions
About this article
Cite this article
Li, M., Kang, E.L. Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks. Stat Comput 29, 1165–1179 (2019). https://doi.org/10.1007/s11222-019-09862-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09862-4