Appendix
Lemma 1
If Assumptions (A.1)–(A.3) hold, then conditional on \(\mathcal {F}_n\), we have
$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}, \end{aligned}$$
(7.1)
and
$$\begin{aligned} \tilde{\mathcal {H}}_{X}^{-1}=O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$
(7.2)
where \(\tilde{\mathcal {H}}_{X}=\frac{\partial ^2\ell ^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\), and the probability measure in \(O_{P|\mathcal {F}_{n}}(\cdot )\) is conditional measure given \(\mathcal {F}_n\).
Proof
For any \(\varvec{\beta }\in J_B \), we can derive that
$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$
(7.3)
For the jth component of \(\dot{\ell }^*(\varvec{\beta })\), i.e., \(\dot{\ell }_j^*(\varvec{\beta })=\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}\),
$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =E\bigg \{\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)_j}{\pi ^*_{ik}}- \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\bigg |\mathcal {F}_{n}\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j^2}{\pi _{ik}}-\Big (\sum _{i=1}^{n_k} (\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big )^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}. \end{aligned}$$
By Assumption (A.2),
$$\begin{aligned} E\left\{ \frac{\dot{\ell }^*_j(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
Using the Markov’s inequality together with (7.3), we can get
$$\begin{aligned} \frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}=O_{P|\mathcal {F}_{n}}\Bigg (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Bigg )^{1/2}. \end{aligned}$$
(7.4)
By Assumption (A.1), we have \(\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}-\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}\). Because \(\frac{\dot{\ell }(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=0\), it follows that (7.1) holds.
To prove (7.2), some direct calculations yield that
$$\begin{aligned} E\left\{ \frac{\partial ^2\ell ^*(\varvec{\beta })}{n \partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} =\frac{ \partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$
(7.5)
For any component \(\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) of \(\frac{\partial ^2\ell ^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) with \(1\le j_1,j_2\le p\), we can derive that
$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}}. \end{aligned}$$
By Assumption (A.2),
$$\begin{aligned} E\left\{ \frac{\partial ^2\ell _{j_1j_2}^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\right\} ^2 =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
It follows from the Markov’s inequality that
$$\begin{aligned} \frac{\partial ^2\ell ^*(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T} =O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}. \end{aligned}$$
(7.6)
Based on Assumptions (A.1) and (A.3), we know (7.2) holds. This ends the proof. \(\square \)
Proof of Theorem 1
Conditional on \(\mathcal {F}_{n}\), the Assumption (A.5), Lemma 1 and (7.4) lead to that \(\frac{\dot{\ell }^*(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0\) in probability. Note that the parameter space \(J_B\) is compact, and \(\hat{\varvec{\beta }}_\mathrm{MLE}\) is the unique solution to \(\frac{\dot{\ell }(\varvec{\beta })}{n}=0\). Thus, it follows from Theorem 5.9 and its remark of van der Vaart (1998) that conditional on \(\mathcal {F}_{n}\), as \(n\rightarrow \infty \),
$$\begin{aligned} \Vert \tilde{\varvec{\beta }} - \hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_{P|\mathcal {F}_{n}}(1). \end{aligned}$$
(7.7)
Using the Taylor’s theorem (Ferguson 1996, Chapter 4), we have
$$\begin{aligned} 0=\frac{\dot{\ell }_j^*(\tilde{\varvec{\beta }})}{n}=\frac{\dot{\ell }_j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _j^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{j}, \end{aligned}$$
(7.8)
where
$$\begin{aligned} R_{j}=(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$
Note that for all \(\varvec{\beta }\),
$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _j^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{P^*_{ik}(\varvec{\beta })(1-P^*_{ik}(\varvec{\beta }))(1-2P^*_{ik}(\varvec{\beta }))}{\pi ^*_{ik}} \mathbf {X}_{ik}^*\mathbf {X}_{ik}^{*T}\mathbf {X}_{ik}^*\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}. \end{aligned}$$
Thus,
$$\begin{aligned} \Big \Vert \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_j^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\Big \Vert \le \sum _{k=1}^K\frac{1}{2r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}} =O_{P|\mathcal {F}_{n}}(n),\quad \nonumber \\ \end{aligned}$$
(7.9)
where the last equality is from the fact that
$$\begin{aligned}&P\left( \sum _{k=1}^K\frac{1}{nr_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\ge \tau \Big |\mathcal {F}_{n}\right) \\&\quad \le \frac{1}{n\tau }E\left( \sum _{k=1}^K\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\Vert \mathbf {X}_{ik}^*\Vert ^3}{\pi ^*_{ik}}\right) \\&\quad =\frac{1}{n\tau }\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3 \rightarrow 0, \end{aligned}$$
as \(\tau \rightarrow \infty \) with Assumption (A.4). From (7.8) and (7.9), we have
$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$
(7.10)
It follows from (7.1) and (7.2), together with (7.7) and (7.10) that
$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}+o_{P|\mathcal {F}_{n}}(\Vert \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$
Hence, \( \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )^{1/2}. \) This ends the proof. \(\square \)
Proof of Theorem 2
Note that
$$\begin{aligned} \frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{r_k}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}^*}{\pi ^*_{ik}} =\sum _{k=1}^{K}\sum _{i=1}^{r_k}\varvec{\eta }_{ik}. \end{aligned}$$
(7.11)
Given \(\mathcal {F}_{n}\), we know that \( \{\varvec{\eta }_{ik}: i=1,\ldots ,n_k,k=1,\ldots ,K\}\) are independent random variables with
$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }_{ik}|\mathcal {F}_{n})\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} -\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}} +O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) \end{aligned}$$
(7.12)
$$\begin{aligned}&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}+o_P(1), \end{aligned}$$
(7.13)
where (7.12) and (7.13) hold by Assumptions (A.2) and (A.5), respectively. Meanwhile, for every \(\varepsilon > 0\),
$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\Big \{\Vert \varvec{\eta }_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}E(\Vert \varvec{\eta }_{ik}\Vert ^3|\mathcal {F}_{n})\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k}\frac{1}{n^3r_k^3}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}\\&\quad \le \frac{1}{\varepsilon }\sum _{k=1}^{K}\frac{1}{n^3r_k^2}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2}.\\ \end{aligned}$$
By Assumptions (A.5) and (A.6), we can derive that
$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}E\{\Vert \varvec{\eta }_{ik}\Vert ^2I(\Vert \varvec{\eta }_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}\}\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^3}{n^3r_k^2}\Big )\le \frac{1}{\varepsilon }O_P\Big (\sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\Big )=o_P(1). \end{aligned}$$
In view of (7.11) and (7.13), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, conditional on \(\mathcal {F}_{n}\), as \(n \rightarrow \infty \) and \(r_k \rightarrow \infty \), we have that
$$\begin{aligned} \frac{1}{n}\varvec{\Gamma }^{-1/2}\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
(7.14)
From Lemma 1, (7.10) and Theorem 1,
$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.15)
It can be checked that
$$\begin{aligned}&-\tilde{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}-(\tilde{\mathcal {H}}_{X}^{-1}-{\mathcal {H}}_{X}^{-1})\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+[{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}_{X}^{-1}]\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
Hence,
$$\begin{aligned} \tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-{\mathcal {H}}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.16)
By Assumption (A.2), we have
$$\begin{aligned} \varvec{\Sigma }=\mathcal {H}_{X}^{-1}\varvec{\Gamma }\mathcal {H}_{X}^{-1}=O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) . \end{aligned}$$
(7.17)
Thus, (7.16) and (7.17) yield that
$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber \\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}\left( \sum _{k=1}^{K}\frac{n_k^2}{n^2r_k}\right) ^{1/2}\nonumber .\\&\quad =-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}\varvec{\Gamma }^{1/2}\varvec{\Gamma }^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+o_P(1). \end{aligned}$$
(7.18)
Note that
$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}({\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2})^T={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Gamma }}^{1/2}{\varvec{\Gamma }}^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}=\mathbf {I}.\nonumber \\ \end{aligned}$$
(7.19)
By (7.17), (7.18) and the Slutsky’s theorem, we can get that as \(n \rightarrow \infty \),
$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\tilde{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
This ends the proof. \(\square \)
Proof of Theorem 3
It can be shown that
$$\begin{aligned} tr(\varvec{\Gamma })= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}tr\left( \frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}_{ik}^T}{\pi _{ik}}\right) \nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\nonumber \\= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}\pi _{ik}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}}\right] \nonumber \\\ge & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k}\left[ \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2 \end{aligned}$$
(7.20)
$$\begin{aligned}= & {} \frac{1}{n^2}\frac{1}{r}\sum _{k=1}^{K}r_k\sum _{k=1}^{K}\frac{\big [\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \big ]^2}{r_k}\nonumber \\\ge & {} \frac{1}{n^2r}\left[ \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right] ^2, \end{aligned}$$
(7.21)
where (7.20) and (7.21) follows from the Cauchy-Schwarz inequality and the equality hold if and only if \(\pi _{ik}\propto |Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \), and \(r_k\propto \sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \), respectively. This ends the proof. \(\square \)
Next, we establish two lemmas that will be used in the proofs of Theorems 4 and 5.
Lemma 2
Under Assumptions (A.4) and (A.7), for l=2 and 4,
$$\begin{aligned} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$
(7.22)
and
$$\begin{aligned} \frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-2}). \end{aligned}$$
(7.23)
Proof
It follows from the expressions of \(r_k(\tilde{\varvec{\beta }}_0)\) and \(\pi _{ik}(\tilde{\varvec{\beta }}_0)\) that
$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^l}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^l\nonumber \\&\quad =\frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^{l-1}}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}\cdot \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+e^{\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0}+e^{-\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_0})\end{aligned}$$
(7.24)
$$\begin{aligned}&\quad \le \frac{1}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}(1+2e^{\lambda \Vert \mathbf {X}_{ik}\Vert })\nonumber \\&\quad \le \frac{3}{rn}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }, \end{aligned}$$
(7.25)
where (7.24) holds by Assumption (A.4). Note that
$$\begin{aligned} E\{\Vert \mathbf {X}_{ik}\Vert ^{l-1}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\}\le \{E(\Vert \mathbf {X}_{ik}\Vert ^{2(l-1)})E(e^{2\lambda \Vert \mathbf {X}_{ik}\Vert })\}^{1/2}<\infty . \end{aligned}$$
(7.26)
Hence, (7.22) follows from (7.25), (7.26) and the law of large numbers. Analogously, we can prove that (7.23) holds. This ends the proof. \(\square \)
Lemma 3
If Assumptions (A.1), (A.4) and (A.7) hold, conditional on \(\mathcal {F}_n\) we have
$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=O_{P|\mathcal {F}_{n}}(r^{-1/2}), \end{aligned}$$
(7.27)
and
$$\begin{aligned} \{\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}\}^{-1} = O_{P|\mathcal {F}_{n}}(1), \end{aligned}$$
(7.28)
where \(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}=\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\).
Proof
For any \(\varvec{\beta }\in J_B \), we can derive that
$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}=\frac{\dot{\ell }(\varvec{\beta })}{n}. \end{aligned}$$
(7.29)
For the jth component \(\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })\) of \(\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })\),
$$\begin{aligned}&E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_{j}(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\bigg \}^2\\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{(\{Y^*_{ik}-P^*_{ik}(\varvec{\beta })\}\mathbf {X}_{ik}^*)^2_j}{\pi _{ik}^*(\tilde{\varvec{\beta }}_0)}- \Big \{\sum _{i=1}^{n_k}(\{Y_{ik}-P_{ik}(\varvec{\beta })\}\mathbf {X}_{ik})_j\Big \}^2\bigg ]\\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^2}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}.\\ \end{aligned}$$
By Lemma 2,
$$\begin{aligned} E\bigg \{\frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0j}(\varvec{\beta })}{n}-\frac{\dot{\ell }_j(\varvec{\beta })}{n}\bigg |\mathcal {F}_{n}\bigg \}^2 =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$
(7.30)
In view of the Markov’s inequality and Assumption (A.1), (7.27) follows from (7.29) and (7.30).
In a similar manner, we obtain
$$\begin{aligned} E\Big \{\frac{\partial ^2\ell ^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n},\tilde{\varvec{\beta }}_0\Big \}=\frac{\partial ^2\ell (\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}. \end{aligned}$$
(7.31)
For any component \(\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) of \(\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\) with \(1\le j_1,j_2\le p\), it can be shown that
$$\begin{aligned}&E\Big \{\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0}^{*j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}-\frac{\partial ^2\ell _{j_1j_2}(\varvec{\beta })}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big |\mathcal {F}_{n}\Big \}^2 \nonumber \\&\quad =\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\bigg [\sum _{i=1}^{n_k}\frac{\{w^2_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\left( \sum _{i=1}^{n_k} \{w_{ik}(\varvec{\beta })\mathbf {X}_{ik}\mathbf {X}_{ik}^T\}_{j_1j_2}\right) ^2\bigg ]\nonumber \\&\quad \le \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^4}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}=O_{P|\mathcal {F}_{n}}(r^{-1}), \end{aligned}$$
(7.32)
where (7.32) holds by Lemma 2. From (7.31), (7.32) and the Markov’s inequality, we know that (7.28) holds. This ends the proof. \(\square \)
Proof of Theorem 4
It follows from (7.29) and (7.30) that given \(\mathcal {F}_{n}\),
$$\begin{aligned} \frac{\dot{\ell }^*_{\tilde{\varvec{\beta }}_0}(\varvec{\beta })}{n}-\frac{\dot{\ell }(\varvec{\beta })}{n}\rightarrow 0, \end{aligned}$$
Thus, conditional on \(\mathcal {F}_{n}\),
$$\begin{aligned} \Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert =o_P(1), \end{aligned}$$
(7.33)
which ensures that \(\breve{\varvec{\beta }}\) is close to \(\hat{\varvec{\beta }}_\mathrm{MLE}\) as long as r is large enough. Using the Taylor’s theorem (Ferguson 1996, Chapter 4),
$$\begin{aligned} 0=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\breve{\varvec{\beta }})}{n}=\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+\frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n\partial \varvec{\beta }\partial \varvec{\beta }^T}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})+\frac{1}{n}R_{\tilde{\varvec{\beta }}_0j}, \end{aligned}$$
(7.34)
where
$$\begin{aligned} R_{\tilde{\varvec{\beta }}_0j}=(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})^T\int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}). \end{aligned}$$
Note that for all \(\varvec{\beta }\),
$$\begin{aligned} \Big \Vert \frac{\partial ^2\ell _{\tilde{\varvec{\beta }}_0j}^*(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^T}\Big \Vert= & {} \Big \Vert \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{P^*_{ik}(\varvec{\beta })\{1-P^*_{ik}(\varvec{\beta })\}\{1-2P^*_{ik}(\varvec{\beta })\}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} \mathbf {X}^*_{ik}\mathbf {X}^{*T}_{ik}\mathbf {X}^*_{ik}\Big \Vert \\\le & {} \sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}, \end{aligned}$$
and by Assumption (A.4),
$$\begin{aligned} P\left( \frac{1}{n}\sum _{k=1}^K\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\Vert \mathbf {X}^*_{ik}\Vert ^3}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)}\ge \tau \Big |\mathcal {F}_{n}\right)\le & {} \frac{\frac{1}{n}\sum _{k=1}^K\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3}{\tau }\rightarrow 0 \end{aligned}$$
in probability as \(\tau \rightarrow \infty \). Thus,
$$\begin{aligned} \left\| \int _0^1\int _0^1\frac{\partial ^2\dot{\ell }_{\tilde{\varvec{\beta }}_0j}^*\{\hat{\varvec{\beta }}_\mathrm{MLE}+uv(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})\}}{\partial \varvec{\beta }\partial \varvec{\beta }^T}vdudv\right\| =O_{P|\mathcal {F}_{n}}(n). \end{aligned}$$
(7.35)
By (7.34) and (7.35),
$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2)\Big \}. \end{aligned}$$
(7.36)
Based on (7.27), (7.28), (7.33) and (7.36), we have
$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})+o_{P|\mathcal {F}_{n}}(\Vert \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ). \end{aligned}$$
Hence, \(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=O_{P|\mathcal {F}_{n}}(r^{-1/2})\). This ends the proof. \(\square \)
Proof of Theorem 5
Let
$$\begin{aligned} \frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}=\frac{1}{n}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{\{Y^*_{ik}-P^*_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}^*_{ik}}{\pi ^*_{ik}(\tilde{\varvec{\beta }}_0)} =\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}.\nonumber \\ \end{aligned}$$
(7.37)
Given \(\mathcal {F}_{n}\) and \(\tilde{\varvec{\beta }}_0\), we know that \(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\) are independent random variables with
$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}\nonumber \\&-\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2.\nonumber \\ \end{aligned}$$
(7.38)
Note that
$$\begin{aligned}&\frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\left( \sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik}\right) ^2\\&\quad =\frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}\mathbf {X}_{ik})^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\frac{(\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert )^2}{\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \\&\quad =\frac{1}{rn^2}\left( \sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad \le \frac{1}{r}\left( \frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \right) ^2\\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$
By (7.38) and as \(r\rightarrow \infty \),
$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k}Var(\varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)= & {} \frac{1}{n^2}\sum _{k=1}^{K}\frac{1}{r_k(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\{Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\}^2\mathbf {X}_{ik}\mathbf {X}^T_{ik}}{\pi _{ik}(\tilde{\varvec{\beta }}_0)}+O_{P|\mathcal {F}_{n}}(r^{-1})\nonumber \\= & {} \varvec{\Gamma }^{\tilde{\varvec{\beta }}_{0}}+o_P(1). \end{aligned}$$
(7.39)
Meanwhile, for every \(\varepsilon > 0\),
$$\begin{aligned}&\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\\&\quad \le \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\Big \{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2\cdot \frac{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert }{\varepsilon }\Big |\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\Big \}\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^3|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0)\\&\quad =\frac{1}{\varepsilon }\sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}\frac{1}{n^3r_k^3(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|^3\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}\\&\quad \le \frac{1}{\varepsilon }\frac{1}{n^3}\sum _{k=1}^{K}\frac{1}{r_k^2(\tilde{\varvec{\beta }}_0)}\sum _{i=1}^{n_k}\frac{\Vert \mathbf {X}_{ik}\Vert ^3}{\pi _{ik}^2(\tilde{\varvec{\beta }}_0)}. \end{aligned}$$
By Lemma 2, as \(r\rightarrow \infty \), we have
$$\begin{aligned} \sum _{k=1}^{K}\sum _{i=1}^{r_k(\tilde{\varvec{\beta }}_0)}E\{\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert ^2I(\Vert \varvec{\eta }^{\tilde{\varvec{\beta }}_0}_{ik}\Vert >\varepsilon )|\mathcal {F}_{n}, \tilde{\varvec{\beta }}_0\}\le \frac{1}{\varepsilon }O_{P|\mathcal {F}_{n}}(r^{-2})=o_P(1). \end{aligned}$$
(7.40)
It follows from (7.37) and (7.39), together with the Lindeberg–Feller central limit theorem (Proposition 2.27 of van der Vaart 1998) and the Slutsky’s theorem, we know that conditional on \(\mathcal {F}_{n}\), as \(n \rightarrow \infty \) and \(r \rightarrow \infty \),
$$\begin{aligned} \frac{1}{n}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
(7.41)
By Lemma 3, (7.36) and Theorem 5, we get that
$$\begin{aligned} \breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}=-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Big \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}+O_{P|\mathcal {F}_{n}}(r^{-1})\Big \} \end{aligned}$$
(7.42)
Note that
$$\begin{aligned}&-\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}-(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}-{\mathcal {H}}_{X}^{-1})\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+\Bigg [{\mathcal {H}}_{X}^{-1}(\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0}_{X}-{\mathcal {H}}_{X})\tilde{\mathcal {H}}^{\tilde{\varvec{\beta }}_0-1}_{X}\Bigg ]\Bigg \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})O_{P|\mathcal {F}_{n}}(1)O_{P|\mathcal {F}_{n}}(r^{-1/2})\\&\quad =-{\mathcal {H}}_{X}^{-1}\Bigg \{\frac{\dot{\ell }_{\tilde{\varvec{\beta }}_0}^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Bigg \}+O_{P|\mathcal {F}_{n}}(r^{-1}). \end{aligned}$$
Hence, (7.42) and (7.22) yield that
$$\begin{aligned}&\varvec{\Sigma }^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE})=-\varvec{\Sigma }^{-1/2}\mathcal {H}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{-1/2}\Big \{\frac{\dot{\ell }^*(\hat{\varvec{\beta }}_\mathrm{MLE})}{n}\Big \}+O_{P|\mathcal {F}_{n}}(r^{-1/2}). \end{aligned}$$
It can be proved that
$$\begin{aligned}&{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad ={\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}\varvec{\Gamma }{\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}\nonumber \\&\quad =\mathbf {I}+{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1} (\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }){\mathcal {H}}_{X}^{-1}{\varvec{\Sigma }}^{-1/2}. \end{aligned}$$
(7.43)
For the distance between \(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}\) and \(\varvec{\Gamma }\), we have
$$\begin{aligned} \Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \le \frac{1}{n^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2\left| \frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\right| . \end{aligned}$$
(7.44)
A straightforward calculation yields that
$$\begin{aligned}&\Big |\frac{1}{r_k(\tilde{\varvec{\beta }}_0)\pi _{ik}(\tilde{\varvec{\beta }}_0)}-\frac{1}{r_k(\hat{\varvec{\beta }}_\mathrm{MLE})\pi _{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})}\Big |\nonumber \\&\quad =\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert \cdot r}-\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert \cdot r}\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \quad +\frac{1}{r}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert } -\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }\bigg |\nonumber \\&\quad \le \frac{1}{r}\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert }{\Vert \mathbf {X}_{ik}\Vert }\nonumber \\&\qquad +\frac{1}{r}\frac{\sum _{k=1}^{K}\sum _{i=1}^{n_k}|P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|\Vert \mathbf {X}_{ik}\Vert }. \end{aligned}$$
(7.45)
Note that
$$\begin{aligned}&\bigg |\frac{1}{|Y_{ik}-P_{ik}(\tilde{\varvec{\beta }}_0)|}-\frac{1}{|Y_{ik}-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})|}\bigg |\nonumber \\&\quad =\big |e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\hat{\varvec{\beta }}_\mathrm{MLE}}-e^{(2Y_{ik}-1)\mathbf {X}_{ik}^T\tilde{\varvec{\beta }}_{0}}\big |\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2, \end{aligned}$$
(7.46)
and
$$\begin{aligned}&\big |P_{ik}(\tilde{\varvec{\beta }}_0)-P_{ik}(\hat{\varvec{\beta }}_\mathrm{MLE})\big |\nonumber \\&\quad =\frac{\big |e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}}-e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}}\big |}{(1+e^{\tilde{\varvec{\beta }}_{0}^T\mathbf {X}_{ik}})(1+e^{\hat{\varvec{\beta }}_\mathrm{MLE}^T\mathbf {X}_{ik}})}\nonumber \\&\quad \le e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert +e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert ^2\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2. \end{aligned}$$
(7.47)
It follows from (7.44)–(7.47) that
$$\begin{aligned}&\Vert \varvec{\Gamma }^{\tilde{\varvec{\beta }}_0}-\varvec{\Gamma }\Vert \nonumber \\&\quad \le \frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{1}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^2e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert \nonumber \\&\qquad +\frac{3}{rn^2}\sum _{k=1}^{K}\sum _{i=1}^{n_k}e^{\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \mathbf {X}_{ik}\Vert \sum _{k=1}^{K}\sum _{i=1}^{n_k}\Vert \mathbf {X}_{ik}\Vert ^3e^{2\lambda \Vert \mathbf {X}_{ik}\Vert }\Vert \tilde{\varvec{\beta }}_{0}-\hat{\varvec{\beta }}_\mathrm{MLE}\Vert ^2\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2})+O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1})\nonumber \\&\quad =O_{P|\mathcal {F}_{n}}(r^{-1}r_0^{-1/2}). \end{aligned}$$
(7.48)
By (7.43) and (7.48),
$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\{{\varvec{\Sigma }}^{-1/2}{\mathcal {H}}_{X}^{-1}(\varvec{\Gamma }^{\tilde{\varvec{\beta }}_0})^{1/2}\}^{T}=\mathbf {I}+O_{P|\mathcal {F}_{n}}(r_0^{-1/2}). \end{aligned}$$
(7.49)
By (7.49) and the Slutsky’s theorem, as \(r_0 \rightarrow \infty \), \(r \rightarrow \infty \) and \(n \rightarrow \infty \), we can get that
$$\begin{aligned} {\varvec{\Sigma }}^{-1/2}(\breve{\varvec{\beta }}-\hat{\varvec{\beta }}_\mathrm{MLE}){\mathop {\longrightarrow }\limits ^{d}} N(0,\mathbf {I}). \end{aligned}$$
This completes the proof. \(\square \)