Abstract
In nonregular problems where the conventional \(n\) out of \(n\) bootstrap is inconsistent, the \(m\) out of \(n\) bootstrap provides a useful remedy to restore consistency. Conventionally, optimal choice of the bootstrap sample size \(m\) is taken to be the minimiser of a frequentist error measure, estimation of which has posed a major difficulty hindering practical application of the \(m\) out of \(n\) bootstrap method. Relatively little attention has been paid to a stronger, stochastic, version of the optimal bootstrap sample size, defined as the minimiser of an error measure calculated directly from the observed sample. Motivated by this stronger notion of optimality, we develop procedures for calculating the stochastically optimal value of \(m\). Our procedures are shown to work under special forms of Edgeworth-type expansions which are typically satisfied by statistics of the shrinkage type. Theoretical and empirical properties of our methods are illustrated with three examples, namely the James–Stein estimator, the ridge regression estimator and the post-model-selection regression estimator.
Similar content being viewed by others
References
Ahmed, S.E., Saleh, A.K., Md, E., Volodin, A.I., Volodin, I.N.: Asymptotic expansion of the coverage probability of James–Stein estimators. Theory Probab. Appl. 51, 1–14 (2007)
Ahmed, S.E., Volodin, A.I., Volodin, I.N.: High order approximation for the coverage probability by a confident set centered at the positive-part James–Stein estimator. Stat. Probab. Lett. 79, 1823–1828 (2009)
Bickel, P., Götze, F., van Zwet, W.: Resampling fewer than \(n\) observations: gains, losses and remedies for losses. Stat. Sinica 7, 1–31 (1997)
Bickel, P.J., Sakov, A.: Extrapolation and the bootstrap. Sankhyā Ser. A 64, 640–652 (2002)
Bickel, P., Sakov, A.: On the choice of \(m\) in the \(m\) out of \(n\) bootstrap and its application to confidence bounds for extrema. Stat. Sinica 18, 967–985 (2008)
Claeskens, G., Hjort, N.L.: Model Selection and Model Averaging. Cambridge University Press, Cambridge (2008)
Datta, S., McCormick, W.P.: Bootstrap inference for a first-order autoregression with positive innovations. J. Am. Stat. Assoc. 90, 1289–1300 (1995)
Götze, F.: Asymptotic approximations and the bootstrap. IMS Bulletin, 56th AMS-Meeting, p. 305 (1993)
Götze, F., Rac̆kauskas, A.: Adaptive choice of bootstrap sample sizes. In: State of the Art in Probability and Statistics, pp. 286–309. I.M.S Publications, London (2001)
Hall, P.: The Bootstrap and Edgeworth Expansion. Springer, New York (1992)
Hall, P., Horowitz, J.L., Jing, B.: On blocking rules for the bootstrap with dependent data. Biometrika 82, 561–574 (1995)
Hoerl, A.E., Kennard, R.W., Baldwin, K.F.: Ridge regression: some simulations. Commun. Stat. 4, 105–123 (1975)
Lahiri, S.N.: On bootstrapping M-estimators. Sankhyā Ser. A 54, 157–170 (1992)
Mammen, E.: When Does Bootstrap Work: Asymptotic Results and Simulations. Lecture Notes in Statistics, vol. 77. Springer, New York, Heidelberg (1992)
Politis, D.N., Romano, J.P., Wolf, M.: Subsampling. Springer, New York (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by Grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project Nos. HKU 703207P and HKU 702508P).
Appendix
Appendix
1.1 Proof of Proposition 1
For \(\rho \left( \mathcal {L}_{m,n}^*-\mathcal {L}_n\right) \) to converge to zero in probability, we must have \(m=o(n)\) necessarily. It then follows from (A.0) that
It is clear that \(\rho \left( m^{-\alpha } \hat{A}_1+ (m/n)^{\beta }\hat{A}_2\right) \) is minimised at
so that
If \(\delta _{n,F}=o(n^{-\alpha \beta /(\alpha +\beta )})\), then necessarily \(\delta _{n,F}=o_p\) \(\left( \rho \left( m^{-\alpha } \hat{A}_1+(m/n)^{\beta }\hat{A}_2\right) \right) _{}\) for any \(m\), so that, using (6), (7) and (8),
which implies \(m_{ opt }=m'_{ opt }(1+o_p(1))\) and the results of part (i) follow.
Under the assumption of part (ii), we have \(\rho \left( m^{-\alpha } \hat{A}_1+\right. \) \(\left. (m/n)^{\beta }\hat{A}_2\right) =O_p(\delta _{n,F})\) whenever
and \(\rho \left( m^{-\alpha }\hat{A}_1+(m/n)^{\beta }\hat{A}_2\right) \) has an order exceeding \(\delta _{n,F}\) if (9) fails to hold. It thus follows, using (6), that \(\rho \left( \mathcal {L}_{m_{ opt },n}^*-\right. \) \(\left. \mathcal {L}_n\right) =\underset{m}{\min }\,\rho \left( \mathcal {L}_{m,n}^*-\mathcal {L}_n\right) =O_p\left( \delta _{n,F}\right) \), with \(m=m_{ opt }\) satisfying (9). The last assertion of (ii) follows trivially by noting (7) and that \(m=\Omega _p\left( n^{\beta /(\alpha +\beta )}\right) \) satisfies (9), which completes the proof of part (ii).
1.2 Proof of Proposition 2
In view of Proposition 1, it suffices to prove that \(\hat{m}=m'_{ opt }(1+o_p(1))\) for \(\hat{m}=\hat{m}_k^{(1)}, \hat{m}_k^{(2)}\) and \(\hat{m}_u\).
Consider first the case \(\hat{m}=\hat{m}_u\). It follows from the proof of Bickel and Sakov’s (2008) Theorem 3 that \(\underset{m}{\inf }\, \rho (\mathcal {L}_{m,n}^*-\mathcal {L}_n)=\Omega _p(n^{-\alpha \beta /(\alpha +\beta )})\) and \(\hat{m}_q=\Omega _p(m_{ opt })=\Omega _p(n^{\beta /(\alpha +\beta )})\). Under (A.0) and the conditions assumed on the \(m_i\)’s, we have
so that \(\hat{\alpha }(\cdot )=\alpha \left\{ 1+o_p((\ln n)^{-1})\right\} \). It follows that
Similar arguments show that \(\hat{\beta }(\cdot )=\beta \left\{ 1+o_p((\ln n)^{-1})\right\} \), and hence
Using (A.0) again, we have
which implies that \(\hat{m}_u=m'_{ opt }(1+o_p(1))\). The same result holds for \(\hat{m}_k^{(1)}\) as an immediate corollary.
To show that \(\hat{m}^{(2)}_k=m'_{ opt }(1+o_p(1))\), we note first that, for bootstrap sample sizes \(M_1,M_2\) satisfying \(M_i=\Omega (n^{\beta /(\alpha +\beta )})\), we have, according to (A.0), that
Setting \((M_1,M_2)\) to be \((m_1,m_2)\) and \((m_2,m_3)\) in the above expansion yields two equations from which we can obtain solutions, up to order \(o_p(1)\), for \(\hat{A}_1(\cdot )\) and \(\hat{A}_2(\cdot )\). The objective functions used for defining \(m_{ opt }'\) and \(\hat{m}^{(2)}_k\) are therefore asymptotically equivalent to first order, so that \(\hat{m}^{(2)}_k=m'_{ opt }(1+o_p(1))\). This completes our proof.
1.3 Proof of (A.0) for example 3.1
Define \(\mathcal{Z}_n=n^{1/2}\Sigma ^{-1/2}(\bar{X}-\theta _n)\) and \(\mathcal{V}_n=n^{1/2}(\hat{\Sigma }-\Sigma )\). Note that \(\mathcal{Z}_n\) converges weakly to a \(p\)-variate standard normal random vector \(\mathcal{Z}\). For any \(\psi \in {{\mathbb {R}}}^p\), define \(\mathcal{R}_n(\psi ) =\left( \Sigma +n^{-1/2}\mathcal{V}_n\right) ^{-1/2} \left( \psi + \Sigma ^{1/2}\mathcal{Z}_n\right) \), and \(\mathcal {J}_n(\cdot |\psi )\) to be the distribution function of
Under suitable moment and Cramér’s conditions on \(F\), Hall (1992) derives Edgeworth expansions under a general smooth function model setting. The method can be applied to establish an Edgeworth expansion for the joint density of \((\mathcal{Z}_n,\mathcal{V}_n)\), which can then be integrated to provide an expansion for \(\mathcal {J}_n(\cdot |\psi )\) of the form
uniformly over \(x>0\) and \(\psi \) in an open neighbourhood of 0, where \(\mathcal {J}_{\infty }(\cdot |\psi )\) and \(\mathcal {J}_{\infty ,1}(\cdot |\psi )\) are functions depending smoothly on \(\psi \) and the moments of \(F\). Ahmed et al. (2007) show that when \(\psi \rightarrow 0, \mathcal {J}_\infty (\cdot |\psi )\) depends on \(\psi \) through \(\psi ^{\mathrm{T}}\Sigma ^{-1}\psi \). Writing \(J_\infty (\cdot |\psi ^{\mathrm{T}}\Sigma ^{-1}\psi )\) for \(\mathcal {J}_\infty (\cdot |\psi )\), setting \(\psi =n^{1/2}\theta _n\rightarrow 0\) and Taylor expanding (10) about \(\psi =0\), we obtain an expansion for \(\mathcal {L}_n(c^2)\), given by
where \(\partial J_\infty (\cdot |0)=\left. (\partial /\partial \tau )J_\infty (\cdot |\tau )\right| _{\tau =0}\). It follows that \(c^2\) should be set at \(J_\infty ^{-1}(1-\kappa |0)+o_p(1)\) for \(D_k(c^2)\) to be asymptotically correct. The bootstrap distribution \(\mathcal {L}^*_{m,n}(\cdot )\) has an expansion given by the sample version of (10) at \(\psi =m^{1/2}\bar{X}\), that is
where \(\hat{\mathcal {J}}_\infty (\cdot |\cdot )\) and \(\hat{\mathcal {J}}_{\infty ,1}(\cdot |\cdot )\) are sample versions of \(\mathcal {J}_\infty (\cdot |\cdot )\) and \(\mathcal {J}_{\infty ,1}(\cdot |\cdot )\), respectively, obtained by replacing population with sample moments of \(F\) in the definitions of the latter. For the case \(m=n, m^{1/2}\bar{X}=\Sigma ^{1/2}\mathcal{Z}_n+o_p(1)\), so that, by (12), \(\mathcal {L}^*_{n,n}(\cdot )\) converges in probability to a random distribution function \(\mathcal {J}_\infty (\cdot |\Sigma ^{1/2}\mathcal{Z})\). It follows that \(\mathcal {L}^{*-1}_{n,n}(1-\kappa )\) fails to converge in probability to the correct limit \(J_\infty ^{-1}(1-\kappa |0)\). For the case \(m\rightarrow \infty \) and \(m=o(n)\), we have \(m^{1/2}\bar{X}=m^{1/2}n^{-1/2}\left( n^{1/2}\theta _n+\Sigma ^{1/2}\mathcal{Z}_n\right) =o_p(1)\). As in (11), we can expand (12) to obtain
which converges in probability to \(J_\infty (x|0)\), inversion of which yields the asymptotically correct limit. It is clear from (11) and (13) that (A.0) holds with \(\varepsilon _n(F)=\partial J_\infty (\cdot |0)\) \(\left( n\,\theta _n^{\mathrm{T}}\Sigma ^{-1}\theta _n\right) (1+o(1)), \alpha =1/2, \beta =1\) and \(\lambda =\infty \), so that \(\delta _{n,F}=|\varepsilon _n(F)|\).
1.4 Proof of (A.0) for example 3.2
Define \(Z_n=n^{1/2}(b(0)-\xi _n), V_n=n^{1/2}\left( n^{-1}X^{\mathrm{T}}X-V_0\right) \) and \(w_n=n^{1/2}(s^2-\sigma ^2)\). For any \(\psi \in {{\mathbb {R}}}^p\), define \(\mathcal {H}_n(\cdot |\psi )\) to be the distribution function of
so that \(\mathcal {L}_n(\cdot )=\mathcal {H}_n(\cdot |n^{1/2}\xi _n)\). Edgeworth expansions developed by Lahiri (1992) for M-estimators can be adapted to show that, under regularity conditions on the joint distribution of \((X_1,Y_1), \mathcal {H}_n(\cdot |\psi )\) admits an Edgeworth-type expansion of the form
uniformly over \(x\in {{\mathbb {R}}}\) and \(\psi \) in an open neighbourhood of 0, where \(\mathcal {H}_\infty (x|\psi )\) and \(\mathcal {H}_{\infty ,1}(x|\psi )\) depend smoothly on \(\psi \) and the moments of \((X_1,Y_1)\). Letting \(\nabla \mathcal {H}_\infty (\cdot |0)=\left. (\partial /\partial \psi )\mathcal {H}_\infty (\cdot | \psi )\right| _{\psi =0}\), setting \(\psi =n^{1/2}\xi _n\) and Taylor expanding about \(\psi =0\), we have
The \(m\) out of \(n\) bootstrap analogues of \(Z_n, V_n\) and \(w_n\) are given respectively by \(Z^*_m=m^{1/2}(b^*_m(0)-b(0)), V^*_m=m^{1/2}\left( m^{-1}X^{*\text {T}} X^*-n^{-1}X^{\mathrm{T}}X\right) \) and \(w^*_m=m^{1/2}(s^{*2}_m-s^2)\). Drawing on the analogy between (14) and
we deduce an expansion analogous to (15) for \(\mathcal {L}^*_{m,n}\), given by
where \(\hat{\mathcal {H}}_\infty (\cdot |\cdot )\) and \(\hat{\mathcal {H}}_{\infty ,1}(\cdot |\cdot )\) are obtained from \(\mathcal {H}_\infty (\cdot |\cdot )\) and \(\mathcal {H}_{\infty ,1}(\cdot |\cdot )\), respectively, by substituting sample moments of \((X,Y)\) for their population moments in the definitions of the latter. If \(m=n\), we have \(n^{1/2}b(0)=Z_n+o(1)\) converging in distribution to \(Z\sim N(0,\sigma ^2V_0^{-1})\), so that, by (17), \(\mathcal {L}^*_{n,n}\) converges in probability to a random distribution function \(\mathcal {H}_\infty (\cdot |Z)\), which fails to estimate \(\mathcal {L}_n(\cdot )\) consistently. If \(m\rightarrow \infty \) and \(m=o(n)\), then \(m^{1/2}b(0)=m^{1/2}n^{-1/2}\left( n^{1/2}\xi _n+Z_n\right) =o_p(1)\), which yields for \(\mathcal {L}^*_{m,n}\) an expansion analogous to (16), given by
which converges in probability to the correct limit \(\mathcal {H}_\infty (x|0)\). Note that the expansions (16) and (18) satisfy (A.0) with \(\varepsilon _n(F)=n^{1/2}\nabla \mathcal {H}_\infty (x|0)^{\mathrm{T}}\xi _n(1+o(1))\) and \(\alpha =\beta =\lambda =1/2\), so that \(\delta _{n,F}= \max \left\{ |\varepsilon _n(F)|,n^{-1/2}\right\} \).
1.5 Proof of (A.0) for example 3.3
Write, for brevity, \(\Delta g_{ij}(w_1,w_2)=g_i(w_1)-g_j(w_2)\) for any \(w_1,w_2\in {{\mathbb {R}}}^5, i,j=0,1,2\). Define, for \(c_1,c_2,c_3\in {{\mathbb {R}}}\),
so that \(\mathcal {L}_n(\cdot )=\mathcal {K}_n\left( \cdot \,,\,\cdot -n^{1/2}\beta _{0, n}\rho _W,\,n^{1/2}\beta _{0,n}\right) \), where \(\rho _W=\mathbb {E}[X_1]/\mathbb {E}[X^2_1]\). Edgeworth expansions under the smooth function model can be applied to the random vector \([g_0(\bar{W}),g_1(\bar{W}),g_2(\bar{W})]^{\mathrm{T}}\) to obtain
uniformly over \((c_1,c_2,c_3)\in {{\mathbb {R}}}^3\), where \(\mathcal {K}_\infty (\cdot )\) and \(\mathcal {K}_{\infty ,1}(\cdot )\) are smooth functions depending on the moments of \((X_1,Y_1)\). Setting \((c_1,c_2,c_3)=\left( t,\,t-n^{1/2} \beta _{0,n}\rho _W,\,n^{1/2}\beta _{0,n}\right) \) in (19) and noting that \(n^{1/2}\beta _{0,n}=o(1)\), we have
where \(\partial _j\mathcal {K}_\infty \) denotes the partial derivative of \(\mathcal {K}_\infty \) with respect to its \(j\)th argument. Denote by \(\hat{\mathcal {K}}_{\infty }\) and \(\hat{\mathcal {K}}_{\infty ,1}\) the sample analogues of \(\mathcal {K}_\infty \) and \(\mathcal {K}_{\infty ,1}\), respectively. The \(m\) out of \(n\) bootstrap version of \(\mathcal {K}_n\) has an expansion analogous to (19), given by
which converges in probability to \(\mathcal {K}_\infty (c_1,c_2,c_3)\). Note that
By the Central Limit Theorem, \(n^{1/2}[\,\Delta g_{00}(\bar{W}, \mu _W),\,\Delta g_{11}\) \((\bar{W},\mu _W),\,g_{22}(\bar{W},\mu _W)\,]^{\mathrm{T}}\) converges in distribution to a trivariate normal random vector \(\mathcal {W}=[\mathcal {W}_0,\mathcal {W}_1,\mathcal {W}_2]^{\mathrm{T}}\). For the conventional bootstrap with \(m=n\), we have that
converges in distribution to \(\left[ \mathcal {W}_1- \mathcal {W}_2, \mathcal {W}_0\right] \). Thus the bootstrap distribution function (22) converges in probability to a random function
which fails to capture the correct limit \(\mathcal {K}_\infty (t,t,0)\). For \(m=o(n)\) and \(m\rightarrow \infty \), both \(m^{1/2}\Delta g_{12}(\bar{W},\bar{W})\) and \(m^{1/2}g_0(\bar{W})\) are of order \(\Omega _p(m^{1/2}n^{-1/2})=o_p(1)\). It follows, by (21) and Taylor expansion, that (22) can be expanded as
which converges to the correct limit \(\mathcal {K}_\infty (t,t,0)\). Note that (A.0) is satisfied by the expansions (20) and (23), which have the same forms as (16) and (18), respectively, established in Sect. 6.4.
Rights and permissions
About this article
Cite this article
Wei, B., Lee, S.M.S. & Wu, X. Stochastically optimal bootstrap sample size for shrinkage-type statistics. Stat Comput 26, 249–262 (2016). https://doi.org/10.1007/s11222-014-9493-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9493-x
Keywords
- Bootstrap consistency
- James–Stein estimator
- m out of n bootstrap
- Post-model-selection
- Ridge regression
- Shrinkage-type
- Stochastically optimal