Optimal subsample selection for massive logistic regression with distributed data

Zuo, Lulu; Zhang, Haixiang; Wang, HaiYing; Sun, Liuquan

doi:10.1007/s00180-021-01089-0

Optimal subsample selection for massive logistic regression with distributed data

Original paper
Published: 27 February 2021

Volume 36, pages 2535–2562, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Lulu Zuo¹,
Haixiang Zhang¹,
HaiYing Wang² &
…
Liuquan Sun³

3012 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

With the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deterministic subsampling for logistic regression with massive data

Article 30 December 2022

Subsampling for Big Data: Some Recent Advances

Statistical Leveraging Methods in Big Data

References

Ai M, Yu J, Zhang H, Wang H (2020) Optimal subsampling algorithms for big data generalized linear models. Stat Sin. https://doi.org/10.5705/ss.202018.0439
Article Google Scholar
Battey H, Fan J, Liu H, Lu J, Zhu Z (2018) Distributed testing and estimation under sparse high dimensional models. Ann Stat 46:1352–1382
Article MathSciNet Google Scholar
Corbett J, Dean J, Epstein M et al (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31, Article No. 8
Ferguson T (1996) A course in large sample theory. Chapman and Hall, New York
Book Google Scholar
Jordan M, Lee J, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681
Article MathSciNet Google Scholar
Kiefer J (1959) Optimum experimental designs. J R Stat Soc B 21:272–319
MathSciNet MATH Google Scholar
Ma P, Mahoney M, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–911
MathSciNet MATH Google Scholar
Schifano E, Wu J, Wang C, Yan J, Chen M (2016) Online updating of statistical inference in the big data setting. Technometrics 58:393–403
Article MathSciNet Google Scholar
Shi C, Lu W, Song R (2018) A massive data framework for M-estimators with cubic-rate. J Am Stat Assoc 113:1698–1709
Article MathSciNet Google Scholar
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, London
Book Google Scholar
Volgushev S, Chao S, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47:1634–1662
Article MathSciNet Google Scholar
Wang H (2019) More efficient estimation for logistic regression with optimal subsample. J Mach Learn Res 20:1–59
MathSciNet Google Scholar
Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample Logistic regression. J Am Stat Assoc 113:829–844
Article MathSciNet Google Scholar
Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405
Zhang T, Ning Y, Ruppert D (2020) Optimal sampling for generalized linear models under measurement constraints. J Comput Graph Stat. https://doi.org/10.1080/10618600.2020.1778483
Article Google Scholar
Zhao T, Cheng G, Liu H (2016) A partially linear framework for massive heterogeneous data. Ann Stat 44:1400–1437
Article MathSciNet Google Scholar
Zuo L, Zhang H, Wang H, Liu L (2021) Sampling-based estimation for massive survival data with additive hazards model. Stat Med 40:441–450

Download references

Acknowledgements

The authors would like to thank the Editor, an Associate Editor and three reviewers for their constructive and insightful comments that greatly improved the manuscript. The work of Wang was supported by National Science Foundation (NSF), USA grant DMS-1812013. The work of Sun was supported in part by the National Natural Science Foundation of China (Grant Nos. 11771431, 11690015 and 11926341) and Key Laboratory of RCSDS, CAS (No. 2008DP173182).

Author information

Authors and Affiliations

Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
Lulu Zuo & Haixiang Zhang
Department of Statistics, University of Connecticut, Storrs, Mansfield, CT, 06269, USA
HaiYing Wang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Liuquan Sun

Authors

Lulu Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Haixiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
HaiYing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liuquan Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haixiang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 1

If Assumptions (A.1)–(A.3) hold, then conditional on $\mathcal {F}_n$, we have

$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}, \end{aligned}$$

(7.1)

and

$$\begin{aligned} \tilde{\mathcal {H}}_{X}^{-1}=O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$

(7.2)

where $\tilde{\mathcal {H}}_{X}=\frac{\partial ^2\ell ^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}$, and the probability measure in $O_{P|\mathcal {F}_{n}}(\cdot )$ is conditional measure given $\mathcal {F}_n$.

Proof

For any $\varvec{\beta }\in J_B $, we can derive that

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$

(7.3)

For the jth component of $\dot{\ell }^*(\varvec{\beta })$, i.e., $\dot{\ell }_j^*(\varvec{\beta })=\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}$,

$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =E\bigg \{\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}- \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j^2}{\pi _{ik}}-\Big (\sum _{i=1}^{n_k} (\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big )^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}. \end{aligned}$$

By Assumption (A.2),

$$\begin{aligned} E\left\{ \frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

Using the Markov’s inequality together with (7.3), we can get

$$\begin{aligned} \frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}=O_{P|\mathcal {F}_{n}}\Bigg (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Bigg )^{1/2}. \end{aligned}$$

(7.4)

By Assumption (A.1), we have $\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}-\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}$. Because $\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=0$, it follows that (7.1) holds.

To prove (7.2), some direct calculations yield that

$$\begin{aligned} E\left\{ \frac{\partial ^2\ell ^*(\varvec{\beta })}{n \partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} =\frac{ \partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$

(7.5)

For any component $\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}$ of $\frac{\partial ^2\ell ^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}$ with $1\le j_1,j_2\le p$, we can derive that

$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}}. \end{aligned}$$

By Assumption (A.2),

$$\begin{aligned} E\left\{ \frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

It follows from the Markov’s inequality that

$$\begin{aligned} \frac{\partial ^2\ell ^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T} =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}. \end{aligned}$$

(7.6)

Based on Assumptions (A.1) and (A.3), we know (7.2) holds. This ends the proof. $\square $

Proof of Theorem 1

Conditional on $\mathcal {F}_{n}$, the Assumption (A.5), Lemma 1 and (7.4) lead to that $\frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0$ in probability. Note that the parameter space $J_B$ is compact, and $\hat{\varvec{\beta }}_\mathrm{MLE}$ is the unique solution to $\frac{\dot{\ell }(\varvec{\beta })}{n}=0$. Thus, it follows from Theorem 5.9 and its remark of van der Vaart (1998) that conditional on $\mathcal {F}_{n}$, as $n\rightarrow \infty $,

$$\begin{aligned} \Vert \tilde{\varvec{\beta }} - \hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_{P|\mathcal {F}_{n}}(1). \end{aligned}$$

(7.7)

Using the Taylor’s theorem (Ferguson 1996, Chapter 4), we have

$$\begin{aligned} 0=\frac{\dot{\ell }_j^*(\tilde{\varvec{\beta }})}{n}=\frac{\dot{\ell }_j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{j}, \end{aligned}$$

(7.8)

where

$$\begin{aligned} R_{j}=(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$

Note that for all $\varvec{\beta }$,

$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _j^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{P^*_{ik}(\varvec{\beta })(1-P^*_{ik}(\varvec{\beta }))(1-2P^*_{ik}(\varvec{\beta }))}{\pi ^*_{ik}} \mathbf {X}_{ik}^*\mathbf {X}_{ik}^{*T}\mathbf {X}_{ik}^*\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}. \end{aligned}$$

Thus,

$$\begin{aligned} \Big \Vert \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\Big \Vert \le \sum _{k=1}^K\frac{1}{2r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}} =O_{P|\mathcal {F}_{n}}(n),\quad \nonumber \\ \end{aligned}$$

(7.9)

where the last equality is from the fact that

$$\begin{aligned}&P\left( \sum _{k=1}^K\frac{1}{nr_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\ge \tau \Big |\mathcal {F}_{n}\right) \\&\quad \le \frac{1}{n\tau }E\left( \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\right) \\&\quad =\frac{1}{n\tau }\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3 \rightarrow 0, \end{aligned}$$

as $\tau \rightarrow \infty $ with Assumption (A.4). From (7.8) and (7.9), we have

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$

(7.10)

It follows from (7.1) and (7.2), together with (7.7) and (7.10) that

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}+o_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$

Hence, $ \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}. $ This ends the proof. $\square $

Proof of Theorem 2

Note that

$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}^*}{\pi ^*_{ik}} =\sum _{k=1}^{K}\sum _{i=1}^{r_k}\varvec{\eta }_{ik}. \end{aligned}$$

(7.11)

Given $\mathcal {F}_{n}$, we know that $ \{\varvec{\eta }_{ik}: i=1,\ldots ,n_k,k=1,\ldots ,K\}$ are independent random variables with

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }_{ik}|\mathcal {F}_{n})\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} -\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} +O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) \end{aligned}$$

(7.12)

$$\begin{aligned}&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}+o_P(1), \end{aligned}$$

(7.13)

where (7.12) and (7.13) hold by Assumptions (A.2) and (A.5), respectively. Meanwhile, for every $\varepsilon > 0$,

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\Big \{\Vert \varvec{\eta }_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}E(\Vert \varvec{\eta }_{ik}\Vert ^3|\mathcal {F}_{n})\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}\frac{1}{n^3r_k^3}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}\\&\quad \le \frac{1}{\varepsilon }\sum _{k=1}^{K}\frac{1}{n^3r_k^2}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}.\\ \end{aligned}$$

By Assumptions (A.5) and (A.6), we can derive that

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^3}{n^3r_k^2}\Big )\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )=o_P(1). \end{aligned}$$

In view of (7.11) and (7.13), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, conditional on $\mathcal {F}_{n}$, as $n \rightarrow \infty $ and $r_k \rightarrow \infty $, we have that

$$\begin{aligned} \frac{1}{n}\varvec{\Gamma }^{-1/2}\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

(7.14)

From Lemma 1, (7.10) and Theorem 1,

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

(7.15)

It can be checked that

$$\begin{aligned}&-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}-(\tilde{\mathcal {H}}_{X}^{-1}-{\mathcal {H}}_{X}^{-1})\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+[{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}_{X}^{-1}]\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

Hence,

$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

(7.16)

By Assumption (A.2), we have

$$\begin{aligned} \varvec{\Sigma }=\mathcal {H}_{X}^{-1}\varvec{\Gamma }\mathcal {H}_{X}^{-1}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$

(7.17)

Thus, (7.16) and (7.17) yield that

$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber .\\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+o_P(1). \end{aligned}$$

(7.18)

Note that

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}({\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2})^T={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}{\varvec{\Gamma }}^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}=\mathbf {I}.\nonumber \\ \end{aligned}$$

(7.19)

By (7.17), (7.18) and the Slutsky’s theorem, we can get that as $n \rightarrow \infty $,

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

This ends the proof. $\square $

Proof of Theorem 3

It can be shown that

$$\begin{aligned} tr(\varvec{\Gamma })= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}tr\left( \frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}\right) \nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}\pi _{ik}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\right] \nonumber \\\ge & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2 \end{aligned}$$

(7.20)

$$\begin{aligned}= & {} \frac{1}{n^2}\frac{1}{r}\sum _{k=1}^{K}r_k\sum _{k=1}^{K}\frac{\big [\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \big ]^2}{r_k}\nonumber \\\ge & {} \frac{1}{n^2r}\left[ \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2, \end{aligned}$$

(7.21)

where (7.20) and (7.21) follows from the Cauchy-Schwarz inequality and the equality hold if and only if $\pi _{ik}\propto |Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert $, and $r_k\propto \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert $, respectively. This ends the proof. $\square $

Next, we establish two lemmas that will be used in the proofs of Theorems 4 and 5.

Lemma 2

Under Assumptions (A.4) and (A.7), for l=2 and 4,

$$\begin{aligned} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$

(7.22)

and

$$\begin{aligned} \frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-2}). \end{aligned}$$

(7.23)

Proof

It follows from the expressions of $r_k(\tilde{\varvec{\beta }}_0)$ and $\pi _{ik}(\tilde{\varvec{\beta }}_0)$ that

$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^l\nonumber \\&\quad =\frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^{l-1}}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}\cdot \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+e^{\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0}+e^{-\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0})\end{aligned}$$

(7.24)

$$\begin{aligned}&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+2e^{\lambda \Vert \mathbf {X}_{ik}\Vert })\nonumber \\&\quad \le \frac{3}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }, \end{aligned}$$

(7.25)

where (7.24) holds by Assumption (A.4). Note that

$$\begin{aligned} E\{\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\}\le \{E(\Vert \mathbf {X}_{ik}\Vert ^{2(l-1)})E(e^{2\lambda \Vert \mathbf {X}_{ik}\Vert })\}^{1/2}<\infty . \end{aligned}$$

(7.26)

Hence, (7.22) follows from (7.25), (7.26) and the law of large numbers. Analogously, we can prove that (7.23) holds. This ends the proof. $\square $

Lemma 3

If Assumptions (A.1), (A.4) and (A.7) hold, conditional on $\mathcal {F}_n$ we have

$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}(r^{-1/2}), \end{aligned}$$

(7.27)

and

$$\begin{aligned} \{\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}\}^{-1} = O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$

(7.28)

where $\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}=\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}$.

Proof

For any $\varvec{\beta }\in J_B $, we can derive that

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$

(7.29)

For the jth component $\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })$ of $\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })$,

$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_{j}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)^2_j}{\pi _{ik}^*(\tilde{\varvec{\beta }}_0)}- \Big \{\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big \}^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}.\\ \end{aligned}$$

By Lemma 2,

$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2 =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$

(7.30)

In view of the Markov’s inequality and Assumption (A.1), (7.27) follows from (7.29) and (7.30).

In a similar manner, we obtain

$$\begin{aligned} E\Big \{\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\Big \}=\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$

(7.31)

For any component $\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}$ of $\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}$ with $1\le j_1,j_2\le p$, it can be shown that

$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\nonumber \\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$

(7.32)

where (7.32) holds by Lemma 2. From (7.31), (7.32) and the Markov’s inequality, we know that (7.28) holds. This ends the proof. $\square $

Proof of Theorem 4

It follows from (7.29) and (7.30) that given $\mathcal {F}_{n}$,

$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0, \end{aligned}$$

Thus, conditional on $\mathcal {F}_{n}$,

$$\begin{aligned} \Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_P(1), \end{aligned}$$

(7.33)

which ensures that $\breve{\varvec{\beta }}$ is close to $\hat{\varvec{\beta }}_\mathrm{MLE}$ as long as r is large enough. Using the Taylor’s theorem (Ferguson 1996, Chapter 4),

$$\begin{aligned} 0=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\breve{\varvec{\beta }})}{n}=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{\tilde{\varvec{\beta }}_0j}, \end{aligned}$$

(7.34)

where

$$\begin{aligned} R_{\tilde{\varvec{\beta }}_0j}=(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$

Note that for all $\varvec{\beta }$,

$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{P^*_{ik}(\varvec{\beta })\{1-P^*_{ik}(\varvec{\beta })\}\{1-2P^*_{ik}(\varvec{\beta })\}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} \mathbf {X}^*_{ik}\mathbf {X}^{*T}_{ik}\mathbf {X}^*_{ik}\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}, \end{aligned}$$

and by Assumption (A.4),

$$\begin{aligned} P\left( \frac{1}{n}\sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}\ge \tau \Big |\mathcal {F}_{n}\right)\le & {} \frac{\frac{1}{n}\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3}{\tau }\rightarrow 0 \end{aligned}$$

in probability as $\tau \rightarrow \infty $. Thus,

$$\begin{aligned} \left\| \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\right\| =O_{P|\mathcal {F}_{n}}(n). \end{aligned}$$

(7.35)

By (7.34) and (7.35),

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$

(7.36)

Based on (7.27), (7.28), (7.33) and (7.36), we have

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})+o_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$

Hence, $\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})$. This ends the proof. $\square $

Proof of Theorem 5

Let

$$\begin{aligned} \frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}^*_{ik}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} =\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}.\nonumber \\ \end{aligned}$$

(7.37)

Given $\mathcal {F}_{n}$ and $\tilde{\varvec{\beta }}_0$, we know that $\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}$ are independent random variables with

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&-\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2.\nonumber \\ \end{aligned}$$

(7.38)

Note that

$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik})^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert )^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad =\frac{1}{rn^2}\left( \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad \le \frac{1}{r}\left( \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$

By (7.38) and as $r\rightarrow \infty $,

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}+O_{P|\mathcal {F}_{n}}(r^{-1})\nonumber \\= & {} \varvec{\Gamma }^{\tilde{\varvec{\beta }}_{0}}+o_P(1). \end{aligned}$$

(7.39)

Meanwhile, for every $\varepsilon > 0$,

$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\Big \{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^3|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{1}{n^3r_k^3(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}\\&\quad \le \frac{1}{\varepsilon }\frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}. \end{aligned}$$

By Lemma 2, as $r\rightarrow \infty $, we have

$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\le \frac{1}{\varepsilon }O_{P|\mathcal {F}_{n}}(r^{-2})=o_P(1). \end{aligned}$$

(7.40)

It follows from (7.37) and (7.39), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, we know that conditional on $\mathcal {F}_{n}$, as $n \rightarrow \infty $ and $r \rightarrow \infty $,

$$\begin{aligned} \frac{1}{n}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

(7.41)

By Lemma 3, (7.36) and Theorem 5, we get that

$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(r^{-1})\Big \} \end{aligned}$$

(7.42)

Note that

$$\begin{aligned}&-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}-(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}-{\mathcal {H}}_{X}^{-1})\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+\Bigg [{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg ]\Bigg \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$

Hence, (7.42) and (7.22) yield that

$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})=-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(r^{-1/2}). \end{aligned}$$

It can be proved that

$$\begin{aligned}&{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad =\mathbf {I}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}. \end{aligned}$$

(7.43)

For the distance between $\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}$ and $\varvec{\Gamma }$, we have

$$\begin{aligned} \Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \le \frac{1}{n^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2\left| \frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\right| . \end{aligned}$$

(7.44)

A straightforward calculation yields that

$$\begin{aligned}&\Big |\frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\Big |\nonumber \\&\quad =\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \cdot r}-\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \cdot r}\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \quad +\frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert }{\Vert \mathbf {X}_{ik}\Vert }\nonumber \\&\qquad +\frac{1}{r}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }. \end{aligned}$$

(7.45)

Note that

$$\begin{aligned}&\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\nonumber \\&\quad =\big |e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\hat{\varvec{\beta }}_\mathrm{MLE}}-e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_{0}}\big |\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2, \end{aligned}$$

(7.46)

and

$$\begin{aligned}&\big |P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\big |\nonumber \\&\quad =\frac{\big |e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}}-e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}}\big |}{(1+e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}})(1+e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}})}\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2. \end{aligned}$$

(7.47)

It follows from (7.44)–(7.47) that

$$\begin{aligned}&\Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \nonumber \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2}). \end{aligned}$$

(7.48)

By (7.43) and (7.48),

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}=\mathbf {I}+O_{P|\mathcal {F}_{n}}(r_0^{-1/2}). \end{aligned}$$

(7.49)

By (7.49) and the Slutsky’s theorem, as $r_0 \rightarrow \infty $, $r \rightarrow \infty $ and $n \rightarrow \infty $, we can get that

$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$

This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuo, L., Zhang, H., Wang, H. et al. Optimal subsample selection for massive logistic regression with distributed data. Comput Stat 36, 2535–2562 (2021). https://doi.org/10.1007/s00180-021-01089-0

Download citation

Received: 23 March 2020
Accepted: 13 February 2021
Published: 27 February 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00180-021-01089-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal subsample selection for massive logistic regression with distributed data

Abstract

Access this article

Similar content being viewed by others

Deterministic subsampling for logistic regression with massive data

Subsampling for Big Data: Some Recent Advances

Statistical Leveraging Methods in Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal subsample selection for massive logistic regression with distributed data

Abstract

Access this article

Similar content being viewed by others

Deterministic subsampling for logistic regression with massive data

Subsampling for Big Data: Some Recent Advances

Statistical Leveraging Methods in Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation