Skip to main content
Log in

Optimal subsample selection for massive logistic regression with distributed data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

With the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ai M, Yu J, Zhang H, Wang H (2020) Optimal subsampling algorithms for big data generalized linear models. Stat Sin. https://doi.org/10.5705/ss.202018.0439

    Article  Google Scholar 

  • Battey H, Fan J, Liu H, Lu J, Zhu Z (2018) Distributed testing and estimation under sparse high dimensional models. Ann Stat 46:1352–1382

    Article  MathSciNet  Google Scholar 

  • Corbett J, Dean J, Epstein M et al (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31, Article No. 8

  • Ferguson T (1996) A course in large sample theory. Chapman and Hall, New York

    Book  Google Scholar 

  • Jordan M, Lee J, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681

    Article  MathSciNet  Google Scholar 

  • Kiefer J (1959) Optimum experimental designs. J R Stat Soc B 21:272–319

    MathSciNet  MATH  Google Scholar 

  • Ma P, Mahoney M, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–911

    MathSciNet  MATH  Google Scholar 

  • Schifano E, Wu J, Wang C, Yan J, Chen M (2016) Online updating of statistical inference in the big data setting. Technometrics 58:393–403

    Article  MathSciNet  Google Scholar 

  • Shi C, Lu W, Song R (2018) A massive data framework for M-estimators with cubic-rate. J Am Stat Assoc 113:1698–1709

    Article  MathSciNet  Google Scholar 

  • van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, London

    Book  Google Scholar 

  • Volgushev S, Chao S, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47:1634–1662

    Article  MathSciNet  Google Scholar 

  • Wang H (2019) More efficient estimation for logistic regression with optimal subsample. J Mach Learn Res 20:1–59

    MathSciNet  Google Scholar 

  • Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample Logistic regression. J Am Stat Assoc 113:829–844

    Article  MathSciNet  Google Scholar 

  • Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405

  • Zhang T, Ning Y, Ruppert D (2020) Optimal sampling for generalized linear models under measurement constraints. J Comput Graph Stat. https://doi.org/10.1080/10618600.2020.1778483

    Article  Google Scholar 

  • Zhao T, Cheng G, Liu H (2016) A partially linear framework for massive heterogeneous data. Ann Stat 44:1400–1437

    Article  MathSciNet  Google Scholar 

  • Zuo L, Zhang H, Wang H, Liu L (2021) Sampling-based estimation for massive survival data with additive hazards model. Stat Med 40:441–450

Download references

Acknowledgements

The authors would like to thank the Editor, an Associate Editor and three reviewers for their constructive and insightful comments that greatly improved the manuscript. The work of Wang was supported by National Science Foundation (NSF), USA grant DMS-1812013. The work of Sun was supported in part by the National Natural Science Foundation of China (Grant Nos. 11771431, 11690015 and 11926341) and Key Laboratory of RCSDS, CAS (No. 2008DP173182).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haixiang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

If Assumptions (A.1)–(A.3) hold, then conditional on \(\mathcal {F}_n\), we have

$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}, \end{aligned}$$
(7.1)

and

$$\begin{aligned} \tilde{\mathcal {H}}_{X}^{-1}=O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$
(7.2)

where \(\tilde{\mathcal {H}}_{X}=\frac{\partial ^2\ell ^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\), and the probability measure in \(O_{P|\mathcal {F}_{n}}(\cdot )\) is conditional measure given \(\mathcal {F}_n\).

Proof

For any \(\varvec{\beta }\in J_B \), we can derive that

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$
(7.3)

For the jth component of \(\dot{\ell }^*(\varvec{\beta })\), i.e., \(\dot{\ell }_j^*(\varvec{\beta })=\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}\),

$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =E\bigg \{\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}- \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j^2}{\pi _{ik}}-\Big (\sum _{i=1}^{n_k} (\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big )^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}. \end{aligned}$$

By Assumption (A.2),

$$\begin{aligned} E\left\{ \frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

Using the Markov’s inequality together with (7.3), we can get

$$\begin{aligned} \frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}=O_{P|\mathcal {F}_{n}}\Bigg (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Bigg )^{1/2}. \end{aligned}$$
(7.4)

By Assumption (A.1), we have \(\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}-\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}\). Because \(\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=0\), it follows that (7.1) holds.

To prove (7.2), some direct calculations yield that

$$\begin{aligned} E\left\{ \frac{\partial ^2\ell ^*(\varvec{\beta })}{n \partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} =\frac{ \partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$
(7.5)

For any component \(\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) of \(\frac{\partial ^2\ell ^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) with \(1\le j_1,j_2\le p\), we can derive that

$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}}. \end{aligned}$$

By Assumption (A.2),

$$\begin{aligned} E\left\{ \frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

It follows from the Markov’s inequality that

$$\begin{aligned} \frac{\partial ^2\ell ^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T} =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}. \end{aligned}$$
(7.6)

Based on Assumptions (A.1) and (A.3), we know (7.2) holds. This ends the proof. \(\square \)

Proof of Theorem 1

Conditional on \(\mathcal {F}_{n}\), the Assumption (A.5), Lemma 1 and (7.4) lead to that \(\frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0\) in probability. Note that the parameter space \(J_B\) is compact, and \(\hat{\varvec{\beta }}_\mathrm{MLE}\) is the unique solution to \(\frac{\dot{\ell }(\varvec{\beta })}{n}=0\). Thus, it follows from Theorem 5.9 and its remark of van der Vaart (1998) that conditional on \(\mathcal {F}_{n}\), as \(n\rightarrow \infty \),

$$\begin{aligned} \Vert \tilde{\varvec{\beta }} - \hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_{P|\mathcal {F}_{n}}(1). \end{aligned}$$
(7.7)

Using the Taylor’s theorem (Ferguson 1996, Chapter 4), we have

$$\begin{aligned} 0=\frac{\dot{\ell }_j^*(\tilde{\varvec{\beta }})}{n}=\frac{\dot{\ell }_j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{j}, \end{aligned}$$
(7.8)

where

$$\begin{aligned} R_{j}=(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$

Note that for all \(\varvec{\beta }\),

$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _j^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{P^*_{ik}(\varvec{\beta })(1-P^*_{ik}(\varvec{\beta }))(1-2P^*_{ik}(\varvec{\beta }))}{\pi ^*_{ik}} \mathbf {X}_{ik}^*\mathbf {X}_{ik}^{*T}\mathbf {X}_{ik}^*\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}. \end{aligned}$$

Thus,

$$\begin{aligned} \Big \Vert \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\Big \Vert \le \sum _{k=1}^K\frac{1}{2r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}} =O_{P|\mathcal {F}_{n}}(n),\quad \nonumber \\ \end{aligned}$$
(7.9)

where the last equality is from the fact that

$$\begin{aligned}&P\left( \sum _{k=1}^K\frac{1}{nr_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\ge \tau \Big |\mathcal {F}_{n}\right) \\&\quad \le \frac{1}{n\tau }E\left( \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\right) \\&\quad =\frac{1}{n\tau }\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3 \rightarrow 0, \end{aligned}$$

as \(\tau \rightarrow \infty \) with Assumption (A.4). From (7.8) and (7.9), we have

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$
(7.10)

It follows from (7.1) and (7.2), together with (7.7) and (7.10) that

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}+o_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$

Hence, \( \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}. \) This ends the proof. \(\square \)

Proof of Theorem 2

Note that

$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}^*}{\pi ^*_{ik}} =\sum _{k=1}^{K}\sum _{i=1}^{r_k}\varvec{\eta }_{ik}. \end{aligned}$$
(7.11)

Given \(\mathcal {F}_{n}\), we know that \( \{\varvec{\eta }_{ik}: i=1,\ldots ,n_k,k=1,\ldots ,K\}\) are independent random variables with

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }_{ik}|\mathcal {F}_{n})\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} -\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} +O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) \end{aligned}$$
(7.12)
$$\begin{aligned}&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}+o_P(1), \end{aligned}$$
(7.13)

where (7.12) and (7.13) hold by Assumptions (A.2) and (A.5), respectively. Meanwhile, for every \(\varepsilon > 0\),

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\Big \{\Vert \varvec{\eta }_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}E(\Vert \varvec{\eta }_{ik}\Vert ^3|\mathcal {F}_{n})\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}\frac{1}{n^3r_k^3}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}\\&\quad \le \frac{1}{\varepsilon }\sum _{k=1}^{K}\frac{1}{n^3r_k^2}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}.\\ \end{aligned}$$

By Assumptions (A.5) and (A.6), we can derive that

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^3}{n^3r_k^2}\Big )\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )=o_P(1). \end{aligned}$$

In view of (7.11) and (7.13), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, conditional on \(\mathcal {F}_{n}\), as \(n \rightarrow \infty \) and \(r_k \rightarrow \infty \), we have that

$$\begin{aligned} \frac{1}{n}\varvec{\Gamma }^{-1/2}\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
(7.14)

From Lemma 1, (7.10) and Theorem 1,

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.15)

It can be checked that

$$\begin{aligned}&-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}-(\tilde{\mathcal {H}}_{X}^{-1}-{\mathcal {H}}_{X}^{-1})\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+[{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}_{X}^{-1}]\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

Hence,

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.16)

By Assumption (A.2), we have

$$\begin{aligned} \varvec{\Sigma }=\mathcal {H}_{X}^{-1}\varvec{\Gamma }\mathcal {H}_{X}^{-1}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.17)

Thus, (7.16) and (7.17) yield that

$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber .\\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+o_P(1). \end{aligned}$$
(7.18)

Note that

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}({\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2})^T={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}{\varvec{\Gamma }}^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}=\mathbf {I}.\nonumber \\ \end{aligned}$$
(7.19)

By (7.17), (7.18) and the Slutsky’s theorem, we can get that as \(n \rightarrow \infty \),

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

This ends the proof. \(\square \)

Proof of Theorem 3

It can be shown that

$$\begin{aligned} tr(\varvec{\Gamma })= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}tr\left( \frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}\right) \nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}\pi _{ik}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\right] \nonumber \\\ge & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2 \end{aligned}$$
(7.20)
$$\begin{aligned}= & {} \frac{1}{n^2}\frac{1}{r}\sum _{k=1}^{K}r_k\sum _{k=1}^{K}\frac{\big [\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \big ]^2}{r_k}\nonumber \\\ge & {} \frac{1}{n^2r}\left[ \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2, \end{aligned}$$
(7.21)

where (7.20) and (7.21) follows from the Cauchy-Schwarz inequality and the equality hold if and only if \(\pi _{ik}\propto |Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \), and \(r_k\propto \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \), respectively. This ends the proof. \(\square \)

Next, we establish two lemmas that will be used in the proofs of Theorems 4 and 5.

Lemma 2

Under Assumptions (A.4) and (A.7), for l=2 and 4,

$$\begin{aligned} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$
(7.22)

and

$$\begin{aligned} \frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-2}). \end{aligned}$$
(7.23)

Proof

It follows from the expressions of \(r_k(\tilde{\varvec{\beta }}_0)\) and \(\pi _{ik}(\tilde{\varvec{\beta }}_0)\) that

$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^l\nonumber \\&\quad =\frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^{l-1}}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}\cdot \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+e^{\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0}+e^{-\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0})\end{aligned}$$
(7.24)
$$\begin{aligned}&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+2e^{\lambda \Vert \mathbf {X}_{ik}\Vert })\nonumber \\&\quad \le \frac{3}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }, \end{aligned}$$
(7.25)

where (7.24) holds by Assumption (A.4). Note that

$$\begin{aligned} E\{\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\}\le \{E(\Vert \mathbf {X}_{ik}\Vert ^{2(l-1)})E(e^{2\lambda \Vert \mathbf {X}_{ik}\Vert })\}^{1/2}<\infty . \end{aligned}$$
(7.26)

Hence, (7.22) follows from (7.25), (7.26) and the law of large numbers. Analogously, we can prove that (7.23) holds. This ends the proof. \(\square \)

Lemma 3

If Assumptions (A.1), (A.4) and (A.7) hold, conditional on \(\mathcal {F}_n\) we have

$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}(r^{-1/2}), \end{aligned}$$
(7.27)

and

$$\begin{aligned} \{\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}\}^{-1} = O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$
(7.28)

where \(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}=\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\).

Proof

For any \(\varvec{\beta }\in J_B \), we can derive that

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$
(7.29)

For the jth component \(\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })\) of \(\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })\),

$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_{j}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)^2_j}{\pi _{ik}^*(\tilde{\varvec{\beta }}_0)}- \Big \{\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big \}^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}.\\ \end{aligned}$$

By Lemma 2,

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2 =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$
(7.30)

In view of the Markov’s inequality and Assumption (A.1), (7.27) follows from (7.29) and (7.30).

In a similar manner, we obtain

$$\begin{aligned} E\Big \{\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\Big \}=\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$
(7.31)

For any component \(\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) of \(\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) with \(1\le j_1,j_2\le p\), it can be shown that

$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\nonumber \\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$
(7.32)

where (7.32) holds by Lemma 2. From (7.31), (7.32) and the Markov’s inequality, we know that (7.28) holds. This ends the proof. \(\square \)

Proof of Theorem 4

It follows from (7.29) and (7.30) that given \(\mathcal {F}_{n}\),

$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0, \end{aligned}$$

Thus, conditional on \(\mathcal {F}_{n}\),

$$\begin{aligned} \Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_P(1), \end{aligned}$$
(7.33)

which ensures that \(\breve{\varvec{\beta }}\) is close to \(\hat{\varvec{\beta }}_\mathrm{MLE}\) as long as r is large enough. Using the Taylor’s theorem (Ferguson 1996, Chapter 4),

$$\begin{aligned} 0=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\breve{\varvec{\beta }})}{n}=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{\tilde{\varvec{\beta }}_0j}, \end{aligned}$$
(7.34)

where

$$\begin{aligned} R_{\tilde{\varvec{\beta }}_0j}=(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$

Note that for all \(\varvec{\beta }\),

$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{P^*_{ik}(\varvec{\beta })\{1-P^*_{ik}(\varvec{\beta })\}\{1-2P^*_{ik}(\varvec{\beta })\}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} \mathbf {X}^*_{ik}\mathbf {X}^{*T}_{ik}\mathbf {X}^*_{ik}\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}, \end{aligned}$$

and by Assumption (A.4),

$$\begin{aligned} P\left( \frac{1}{n}\sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}\ge \tau \Big |\mathcal {F}_{n}\right)\le & {} \frac{\frac{1}{n}\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3}{\tau }\rightarrow 0 \end{aligned}$$

in probability as \(\tau \rightarrow \infty \). Thus,

$$\begin{aligned} \left\| \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\right\| =O_{P|\mathcal {F}_{n}}(n). \end{aligned}$$
(7.35)

By (7.34) and (7.35),

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$
(7.36)

Based on (7.27), (7.28), (7.33) and (7.36), we have

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})+o_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$

Hence, \(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})\). This ends the proof. \(\square \)

Proof of Theorem 5

Let

$$\begin{aligned} \frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}^*_{ik}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} =\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}.\nonumber \\ \end{aligned}$$
(7.37)

Given \(\mathcal {F}_{n}\) and \(\tilde{\varvec{\beta }}_0\), we know that \(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\) are independent random variables with

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&-\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2.\nonumber \\ \end{aligned}$$
(7.38)

Note that

$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik})^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert )^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad =\frac{1}{rn^2}\left( \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad \le \frac{1}{r}\left( \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$

By (7.38) and as \(r\rightarrow \infty \),

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}+O_{P|\mathcal {F}_{n}}(r^{-1})\nonumber \\= & {} \varvec{\Gamma }^{\tilde{\varvec{\beta }}_{0}}+o_P(1). \end{aligned}$$
(7.39)

Meanwhile, for every \(\varepsilon > 0\),

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\Big \{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^3|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{1}{n^3r_k^3(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}\\&\quad \le \frac{1}{\varepsilon }\frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}. \end{aligned}$$

By Lemma 2, as \(r\rightarrow \infty \), we have

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\le \frac{1}{\varepsilon }O_{P|\mathcal {F}_{n}}(r^{-2})=o_P(1). \end{aligned}$$
(7.40)

It follows from (7.37) and (7.39), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, we know that conditional on \(\mathcal {F}_{n}\), as \(n \rightarrow \infty \) and \(r \rightarrow \infty \),

$$\begin{aligned} \frac{1}{n}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
(7.41)

By Lemma 3, (7.36) and Theorem 5, we get that

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(r^{-1})\Big \} \end{aligned}$$
(7.42)

Note that

$$\begin{aligned}&-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}-(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}-{\mathcal {H}}_{X}^{-1})\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+\Bigg [{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg ]\Bigg \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$

Hence, (7.42) and (7.22) yield that

$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})=-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(r^{-1/2}). \end{aligned}$$

It can be proved that

$$\begin{aligned}&{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad =\mathbf {I}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}. \end{aligned}$$
(7.43)

For the distance between \(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}\) and \(\varvec{\Gamma }\), we have

$$\begin{aligned} \Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \le \frac{1}{n^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2\left| \frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\right| . \end{aligned}$$
(7.44)

A straightforward calculation yields that

$$\begin{aligned}&\Big |\frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\Big |\nonumber \\&\quad =\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \cdot r}-\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \cdot r}\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \quad +\frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert }{\Vert \mathbf {X}_{ik}\Vert }\nonumber \\&\qquad +\frac{1}{r}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }. \end{aligned}$$
(7.45)

Note that

$$\begin{aligned}&\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\nonumber \\&\quad =\big |e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\hat{\varvec{\beta }}_\mathrm{MLE}}-e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_{0}}\big |\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2, \end{aligned}$$
(7.46)

and

$$\begin{aligned}&\big |P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\big |\nonumber \\&\quad =\frac{\big |e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}}-e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}}\big |}{(1+e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}})(1+e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}})}\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2. \end{aligned}$$
(7.47)

It follows from (7.44)–(7.47) that

$$\begin{aligned}&\Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \nonumber \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2}). \end{aligned}$$
(7.48)

By (7.43) and (7.48),

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}=\mathbf {I}+O_{P|\mathcal {F}_{n}}(r_0^{-1/2}). \end{aligned}$$
(7.49)

By (7.49) and the Slutsky’s theorem, as \(r_0 \rightarrow \infty \), \(r \rightarrow \infty \) and \(n \rightarrow \infty \), we can get that

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zuo, L., Zhang, H., Wang, H. et al. Optimal subsample selection for massive logistic regression with distributed data. Comput Stat 36, 2535–2562 (2021). https://doi.org/10.1007/s00180-021-01089-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01089-0

Keywords

Navigation