Skip to main content

Advertisement

Log in

The S-U algorithm for missing data problems

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

We present a new Monte-Carlo method for finding the solution of an estimating equation that can be expressed as the expected value of a ‘full data’ estimating equation in which the expected value is with respect to the distribution of the missing data given the observed data. Equations such as these arise whenever the E-M algorithm can be used. The algorithm alternates between two steps: an S-step, in which the missing data are simulated, either from the conditional distribution described above or from a more convenient importance sampling distribution, and a U-step, in which parameters are updated using a closed-form expression that does not require a numerical maximization. We present two numerical examples to illustrate the method. Theoretical results are obtained establishing consistency and asymptotic normality of the approximate solution obtained by our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  • Andrews, D. W. K. (1992) Generic uniform convergence. Econometric Theory 8, 241–257.

    Article  MathSciNet  Google Scholar 

  • Celeux, G. and Diebold, D. (1985) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Stat. Quarterly, 2, 73–82.

    Google Scholar 

  • Crouch, E. A. C., and Spiegelman, D. (1990) The Evaluation of Integrals of the Form \(\int_{ - \infty }^{ + \infty }\) f(t)exp(−t2) dt: Application to Logistic-Normal Models, J. Amer. Statist. Assoc., 85, 464–469.

    MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Statist. Soc. B, 39, 1–37.

    MATH  Google Scholar 

  • Diebold, J. and Ip, E. H. S. (1996) Stochastic EM: method and application. In Markov Chain Monte Carlo in Practice (ed.s W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), pp. 259–273. London: Chapman & Hall.

    Google Scholar 

  • Diggle, P. J., Liang, K.-Y., and Zeger, S. L. (1994) Analysis of Longitudinal Data, Oxford, UK: Oxford University Press.

    MATH  Google Scholar 

  • Gelfand, A. E. and Carlin, B. P. (1993) Maximum likelihood estimation for constrained or missing data models, Canadian J. Statist. 21, 303–311.

    Article  MathSciNet  Google Scholar 

  • Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23 Symposium on the Interface (ed. E. M. Keramidas), pp. 156–163. Fairfax: Interface Foundation.

    Google Scholar 

  • Geyer, C. J. and Thompson, E. A. (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J. R. Statist. Soc. B, 54, 657–699.

    Google Scholar 

  • Geyer, C. J. (1995) Estimation and Optimization of Functions. In Markov Chain Monte Carlo in Practice (eds. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), pp. 241–258. London: Chapman & Hall.

    Google Scholar 

  • Hyde, C. C., and Hall, P. (1980) Martingale Limit Theory and its Applications, New York, NY: Academic Press.

    Google Scholar 

  • Jones, B., and Kenward, M. G. (1989) Design and Analysis of Cross-Over Studies, London: Chapman & Hall.

    MATH  Google Scholar 

  • Press, W. H., Teukolsky, S. A., Vetterling, B. T., and Flannery, B. P. (1992) Numerical Recipes in Fortran: The Art of Scientific Computing (2nd Edition), Cambridge, UK: Cambridge University Press.

    MATH  Google Scholar 

  • Ruppert, D., Reish, R. L., Deriso, R. B., and Carroll, R. J. (1984) Optimization using Stochastic Approximation and Monte Carlo Simulation (with Application to Harvesting of Atlantic Menhaden). Biometrics, 40, 535–545.

    Article  Google Scholar 

  • Ruppert, D. (1991), Stochastic Approximation. In Handbook of Sequential Analysis (ed.s B. K. Ghosh and P. K. Sen), pp. 503–29. New York: Marcel Dekker.

    Google Scholar 

  • Satten, G. A. (1996) Rank-based inference in the proportional hazards model for interval censored data. Biometrika, 83, 355–370.

    Article  Google Scholar 

  • Satten, G. A., Datta, S., and Williamson, J., (1998) Inference based on imputed failure times for the proportional hazards model with interval censored data. J. Amer. Statist. Assoc. 93, 318–327.

    Article  MathSciNet  Google Scholar 

  • Sternberg, M. R. and Satten, G. A. (1999). Discrete-time nonparametric estimation for semi-Markov models of chain-of-events data. Biometrics 55, 514–522.

    Article  Google Scholar 

  • Tanner, M. A. (1993) Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. New York: Springer-Verlag.

    Book  Google Scholar 

  • Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J. Amer. Statist. Assoc. 85, 699–704.

    Article  Google Scholar 

  • Wu, C. F. J. (1985) Efficient sequential designs with binary data. J. Amer. Statist. Assoc. 80, 974–984.

    Article  MathSciNet  Google Scholar 

  • Wu, C. F. J. (1986) Maximum likelihood recursion and stochastic approximation in sequential designs. In Adaptive Statistical Procedures and Related Topics (ed. J. Van Ryzin), pp. 298–313. Hayward, CA: Institute of Mathematical Statistics.

    Chapter  Google Scholar 

  • Younes, L. (1988) Estimation and annealing for gibbsian fields. Ann. de l’Inst. Henri Poincare. Sect. B, Prob. et Statist., 24, 269–294.

    MathSciNet  MATH  Google Scholar 

  • Zeger, S. L., and Karim, M. R., (1991) Generalized linear models with random effects; A Gibbs sampling approach. J. Amer. Statist. Assoc. 86, 79–86.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

Regularity assumptions for Theorem 1.

Let \(\Omega = [\hat \theta - \eta ,\hat \theta + \eta ]\), η > 0 be a compact neighborhood of \({\hat \theta }\), let log+(x) = max[0, log(x)], x > 0, and let ∥ ∥ denote the Euclidian norm, and let \({\rm{G}}_\theta ^{ - 1{\rm{i}}}({\rm{u}}\vert{{\rm{y}}_{\rm{i}}}) = \mathop {\inf }\limits_{\rm{x}} \,\{ {\rm{G}}_\theta ^{\rm{i}}({\rm{x}}\vert{{\rm{y}}_{\rm{i}}}) \ge {\rm{u}}\} \). Then, the following conditions are assumed to hold for each 1 ≤ i ≤ N.

C1. There exist non-negative functions Ai and hi with \(\int\limits_0^1 {{{\rm{A}}_{\rm{i}}}({\rm{u}})} \) log+[Ai(u)] du < ∞, and hi(x) ↓ 0 as x ↓ 0, such that

$$\begin{array}{c}{\left\Vert\mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]-\mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right\Vert} \\ { \leq \mathrm{A}_{\mathrm{i}}(\mathrm{u}) h_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)}\end{array}$$

θ, θ′ ∈ Ω, u ∈ (0,1), and

$${E_{{{\rm{G}}_\theta }({\rm{X}}\vert{{\rm{y}}_{\rm{i}}})}}[{{\rm{w}}_\theta }({\rm{X}},{{\rm{y}}_{\rm{i}}}){\log ^ + }{{\rm{w}}_\theta }({\rm{X}},{{\rm{y}}_{\rm{i}}})] < \infty $$

C2a. There exist non-negative functions Bi and ki with \(\int\limits_0^1 {{{\rm{B}}_{\rm{i}}}({\rm{u}})} \) log+[Bi(u)] du < ∞, and ki(x) ↓ 0 as x ↓ 0, such that

$$\begin{array}{l}{\left\Vert \mathrm{S}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta}^{\mathrm{i}}[G_{\theta}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right.} \\ {\quad-\mathrm{S}_{\mathrm{i}}[G_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert\mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\left\Vert \leq \mathrm{B}_{\mathrm{i}}(\mathrm{u})k_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)\right.}\end{array}$$

θ, θ′ ∈ Ω, u ∈ (0,1), and

$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{X} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\mathrm{S}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\mathrm{S}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}\left(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\}]<\infty$$

C2b. There exist non-negative functions \({\widetilde{\rm{B}}_{\rm{i}}}\) and \({\widetilde{k}_{\rm{i}}}\) with \(\int\limits_0^1 {{{\widetilde{\rm{B}}}_{\rm{i}}}({\rm{u}}){{\log }^ + }[{{\widetilde{\rm{B}}}_{\rm{i}}}({\rm{u}})]\,{\rm{du}} < \infty } \), and \({\widetilde{k}_{\rm{i}}}({\rm{x}}) \downarrow 0\) as x ↓ 0, such that

$$\begin{array}{l}{ \left\Vert \tilde{\mathrm{S}}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right.} \\ {\ \ \left.-\tilde{\mathrm{S}}_{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right\Vert \leq \tilde{\mathrm{B}}_{\mathrm{i}}(\mathrm{u}) \tilde{k}_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)}\end{array}$$
$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{X} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\tilde{\mathrm{S}}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\tilde{\mathrm{S}}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\}]<\infty$$

C3. There exist non-negative functions Ci and li with \(\int\limits_0^1 {{{\rm{C}}_{\rm{i}}}({\rm{u}})} \) log+[Ci(u)] du < ∞, and li(x) ↓ 0 as x 0, such that

$$\begin{array}{l}\left\Vert \mathcal{H}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\right.\\ \left.-\mathcal{H}_{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\right\Vert \leq \mathrm{C}_{\mathrm{i}}(\mathrm{u})l_{\mathrm{i}}(\Vert \theta-\theta^{\prime}\Vert)\end{array}$$

θ,θ′ ∈ Ω, u ∈ (0,1), and

$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{x} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\mathcal{H}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\mathcal{H}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\}]<\infty$$

C4. \({\rm{f}}_\theta ^{\rm{i}}({{\rm{y}}_{\rm{i}}}),\;\;\;\;\;\;\;{\mathbb{S}_{\rm{i}}}({{\rm{y}}_{\rm{i}}}\vert\theta )\) and ℍi(yiθ) are twice continuously differentiable in \(\theta ,\;\;\;\;\;{\rm{f}}_\theta ^{\rm{i}}({{\rm{y}}_{\rm{i}}})\) is positive, and ℍT({yi} ∣ θ) is non-singular on Ω.

Conditions (C1) − (C3) are needed for the uniform SLLN corresponding to the summands wijk, Sijk wijk, \({\widetilde{\rm{S}}_{{\rm{ijk}}}}{{\rm{w}}_{{\rm{ijk}}}}\) and \({{\cal H}_{{\rm{ijk}}}}{{\rm{w}}_{{\rm{jk}}}}\) respectively. If \({\rm{G}}_\theta ^{\rm{i}}( \cdot \vert{\rm{y}})\) does not depend on θ, these conditions take a simpler form.

Condition (C2b) is only required if \({\widetilde{\rm{S}}_{\rm{i}}}({\rm{x}},{\rm{y}}\vert\theta ) \ne {{\rm{S}}_{\rm{i}}}({\rm{x}},{\rm{y}}\vert\theta )\).

Proof of Theorem 1. For simplicity of presentation, we assume that θ is a real parameter and N = 1. In that case, can omit the subscript i throughout and denote \({\mathbb{S}_{\rm{T}}} = {\mathbb{S}_{\rm{1}}}\) by \(\mathbb{S}\) etc., throughout the proof. K will stand for a generic constant.

Note that we can write \({{\rm{w}}_{{\rm{ijk}}}} = {\rm{w}}_{{\theta _{\rm{j}}}}^{\rm{i}}[{\rm{G}}_{{\theta _{\rm{j}}}}^{{\rm{ - 1i}}}({{\rm{U}}_{{\rm{jk}}}},{\rm{y}}),\;{\rm{y}}]\), where Uijk are i.i.d. U[0,1]. It is easy to see that by (C1), the conditions BD, P-SLLN and S-LIP in the uniform SLLN of Andrews (1992) are satisfied; here, uniformity refers to uniform in θ ∈ Ω. Hence, by Theorem 3 of the same paper,

$${1 \over {\rm{M}}}\sum\limits_{{\rm{k}} = 1}^{\rm{M}} {{{\rm{w}}_{{\rm{ijk}}}}} \;\;\mathop \to \limits^{{\rm{a}}.{\rm{s}}{\rm{.}}} \;\;\phi _{{\theta _{\rm{j}}}}^{\rm{i}}({\rm{y}})\;\;\;,$$

uniformly in j ≥ 1, as M → ∞, provided the θj′s lie in Ω; \(\phi _\theta ^{\rm{i}}( \cdot )\) is defined in (12). In the same way, by (C2) and (C3),

$$\begin{array}{l}{\frac{1}{\mathrm{M}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{S}_{\text{ik}}\mathrm{w}_{\text{jk}}\ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow}\ \ \phi_{\theta_{\mathrm{i}}}(\mathrm{y}) \mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{i}})}, \\ {\frac{1}{\mathrm{M}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \tilde{\mathrm{S}}_{\text{ik}} \mathrm{w}_{\text{jk}}\ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow} \ \ \phi_{\theta_{\mathrm{i}}}(\mathrm{y}) \tilde{\mathbb{S}}\left(\mathrm{y} \vert \theta_{\mathrm{i}}\right)},\end{array}$$

and

$$\frac{1}{\mathrm{M}} \sum_{\mathrm{k}=1}^{\mathrm{M}} \mathcal{H}_{\text{jk}}\ \mathrm{w}_{\text{jk}}\ \ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow}\ \ \ \phi_{\theta_{\mathrm{j}}}(\mathrm{y})\left[\mathbb{H}\left(\mathrm{y} \vert \theta_{\mathrm{j}}\right)-\mathbb{S}\left(\mathrm{y} \vert \theta_{\mathrm{j}}\right) \tilde{\mathbb{S}}^{\mathrm{T}}(\mathrm{y} \vert \theta_{\mathrm{j}})\right],$$

uniformly in j ≥ 1, as M → ∞, provided the θj′s lie in Ω. Therefore

$$\mathrm{r}_{\mathrm{i}} \equiv\left\vert\frac{\widehat{\mathbb{S}}_{\mathrm{j}}}{\widehat{\mathbb{H}}_{\mathrm{j}}}-\frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{S}\left(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}\right)}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{H}\left(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}\right)}\right\vert \ \ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow} 0,$$
((A1.1))

uniformly in j ≥ 1, as M → ∞, provided the θj′s lie in Ω.

Next, note that by Taylor expanding both the numerator and the denominator of the second ratio in rj we get

$$\left\vert\frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathrm{j}^{\prime}\sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y})\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-\frac{\mathbb{S}(\mathrm{y} \vert \overline{\theta})}{\mathbb{H}(\mathrm{y} \vert \overline{\theta})}\right\vert \leq \text{Kj}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\overline{\theta})^{2}.$$
((A1.2))

Similarly,

$$\left\vert \mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-\frac{\mathbb{S}(\mathrm{y} \vert \overline{\theta})}{\mathbb{H}(\mathrm{y} \vert \overline{\theta})}\right\vert \leq \mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\overline{\theta})^{2}.$$
((A1.3))

Combining (A1.1), (A1.2) and (A1.3) we get

$$\begin{array}{lll}\left\vert\frac{\widehat{\mathbb{S}}_{\mathrm{j}}}{\widehat{\mathbb{H}}_{\mathrm{j}}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}\right\vert & \leq & \mathrm{r}_{\mathrm{j}}+\mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_\mathrm{{j}^{\prime}}-\overline{\theta})^{2} \\ & \leq & \mathrm{r}_{\mathrm{j}}+\mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\widehat{\theta})^{2}, \end{array}$$
((A1.4))

provided the θj′s lie in Ω.

By a Taylor expansion of \(({\rm{y}}\vert\widehat\theta )\) around θj we get

$$0=\mathbb{S}(\mathrm{y} \vert \widehat{\theta})=\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}})+(\widehat{\theta}-\theta_{\mathrm{j}}) \mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}})+\frac{1}{2}(\widehat{\theta}-\theta_{\mathrm{j}})^{2} \mathbb{S}^{\prime \prime}(\mathrm{y} \vert \theta_{\mathrm{i}}^{\ast}),$$
((A1.5))

where \(\theta _{\rm{j}}^ {\ast} \) lies between θj and \(\widehat\theta \). Dividing (A1.5) by ℍ(y ∣ θj) and averaging, we obtain

$$\widehat{\theta}=\overline{\theta}_{\mathrm{j}}-\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{ \mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-(2 \mathrm{j})^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\widehat{\theta}-\theta_{\mathrm{j}^{\prime}}) \frac{\mathbb{S}^{\prime \prime}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}^{\ast})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}.$$
((A1.6))

Using \({\theta _{{\text{j + 1}}}} = {\bar \theta _\text{j}}/{ \mathbb{\hat{H}}_\text{j}}\), (A1.4) and (A1.6) we obtain

$$\vert\widehat\theta - {\theta _{{\rm{j}} + 1}}\vert\; \le \;{{\rm{r}}_{\rm{j}}} + {\rm{K}}\;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {({\theta _{{\rm{j}}\prime }} - \widehat\theta )^2},$$

provided the θj′s lie in Ω.

Let P > max(2K, η−1). Then re-write the above inequality as

$${{\rm{a}}_{{\rm{j}} + 1}}\; \le \;{\epsilon_{\rm{j}}} + {{\rm{K}} \over {\rm{P}}}\;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{a}}_{{\rm{j}}\prime }^2\;,$$
((A1.7))

where \({{\rm{a}}_{\rm{j}}} = {\rm{P}}\vert\widehat\theta - {\theta _{\rm{j}}}\vert\) and j = rjP, j ≥ 1. By (A1.1), on a set of probability one, select M large enough so that \({\epsilon_{\rm{j}}} < {1 \over 2},{\kern 1pt} \forall {\rm{j}} \ge {\rm{1}}\), provided the θj′s lie in Ω. Suppose that the starting value \({\theta _1} \in \Omega = [\widehat\theta - \eta ,\widehat\theta + \eta ]\). Then a1 < 1 and using (A1.7) we find that \({{\rm{a}}_2} \le ({1 \over 2} + {{\rm{K}} \over {\rm{P}}}) < 1\) which implies that θ2 ∈ Ω. Inductively using (A1.7) we find that aj ∈ Ω and \(0 \le {{\rm{a}}_{\rm{j}}} \le ({1 \over 2} + {{\rm{K}} \over {\rm{P}}})\), ∀j ≥ 1. Therefore, applying “limsup” to both sides of (A1.7) and using the fact that for aj ≥ 0,

$$\mathop {\lim \sup }\limits_{{\rm{j}}\; \to \;\infty } \;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{a}}_{{\rm{j}}\prime }^2 \le {(\mathop {\lim \sup }\limits_{{\rm{j}}\; \to \;\infty } \;{{\rm{a}}_{\rm{j}}})^2}\;,$$

and that \({{\rm{r}}_{\rm{j}}}\mathop \to \limits^{{\rm{a}}.{\rm{s}}.} 0\) as j ∞ (for any given replication size M), we conclude that on a set of probability one,

$$\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} \le {1 \over 2}{(\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}})^2},$$

which immediately implies \(\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} = 0\), since \(\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} \in [0,1)\). Hence, convergence of θj to \(\widehat\theta \) is established. For the case N > 1 note that while equations (A1.1)(A1.3) become considerably more complicated, equation (A1.4) remains unchanged. □

Regularity assumptions for Theorem 2: Assume the conditions for Theorem 1 plus

N1. The matrix \(\sum\limits_{\rm{i}} {{{\rm{V}}^{\rm{i}}}} \) is positive definite.

Proof of Theorem 2. Write

$$\sqrt {\rm{j}} ({\theta _{{\rm{j}} + 1}} - \widehat\theta ) = \sqrt {\rm{j}} \,({\overline \theta _{\rm{j}}} - \widehat\theta - \mathbb{\widehat{H}}_{\rm{j}}^{ - 1} {\mathbb{\widehat{S}}_{\rm{j}}})$$
$$\begin{array}{l}{=\widehat{\mathbb{H}}_{\mathrm{j}}^{-1}\{\mathbb{H}_{\mathrm{T}}(\{ \mathrm{y}_{\mathrm{i}}\} \vert \widehat{\theta})-\widehat{\mathbb{H}}_{\mathrm{j}}\} \sqrt{\mathrm{j}}(\overline{\theta}_{\mathrm{j}}-\widehat{\theta})} \\ {+\widehat{\mathbb{H}}_{\mathrm{j}}^{-1}[\sqrt{\mathrm{j}}\{\mathbb{H}_{\mathrm{T}}(\{\mathrm{y}_{\mathrm{i}}\} \vert \widehat{\theta})(\overline{\theta}_{\mathrm{j}}-\widehat{\theta})-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{i}} \mathbb{S}_{\mathrm{T}}( \{\mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\}]} \\ {-\widehat{\mathbb{H}}_{\mathrm{j}}^{-1} \sqrt{\mathrm{j}} \{\widehat{\mathbb{S}}_{\mathrm{j}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{T}}(\{\mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\}}.\end{array}$$
((A2.1))

Since \({\mathbb{\widehat{H}}_{\rm{j}}}\mathop \to \limits^P {\mathbb{H}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert{\widehat\theta })\) and \(\sqrt {\rm{j}} \,({\overline \theta _{\rm{j}}} - \widehat\theta )\) is Op(1), the first term of (A2.1) converges to zero in probability.

Taylor expanding \({\mathbb{S}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert{\theta _{\rm{j}}})\) around \(\widehat\theta \) and averaging, we obtain

$${{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\mathbb{S}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} {\rm{\vert}}{\theta _{{\rm{j}}\prime }}) = {\mathbb{H}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} {\rm{\vert}}\hat \theta )\;({\bar \theta _{\rm{j}}} - \hat \theta ) + {{\rm{O}}_{\rm{p}}}({{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{\vert}}{\theta _{{\rm{j}}\prime }} - \hat \theta {{\rm{\vert}}^2})\;;$$

therefore, the second term on the right hand side of (A2.1) is op(1), since \({{\rm{j}}^{1/2}}\sum\limits_{{\rm{j}}\prime = 1}^j {{\rm{|}}{\theta _{{\rm{j}}\prime }} - \widehat\theta {\rm{|}} = {{\rm{O}}_{\rm{p}}}} \)

From (5), (6) and (8), we have

$$\widehat{\mathbb{S}}_{\mathrm{j}} \equiv \sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\frac{1}{\mathrm{j}\cdot\mathrm{M}} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{S}_{\mathrm{ij}^{\prime}\mathrm{k}} \mathrm{w}_{\mathrm{ij}^{\prime}\mathrm{k}}}{\frac{1}{\mathrm{j}\cdot\mathrm{M}} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{w}_{\mathrm{ij}^{\prime}\mathrm{k}}} \equiv \sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\mathrm{N}_{\mathrm{i}}}{\mathrm{D}_{\mathrm{i}}}.$$

It is easy to verify the conditional Lindeberg condition for linear combinations of (Ni, Di). By the martingale central limit theorem (Corollary 3.1 of Hyde and Hall, 1980), the asymptotic distribution of \(\sqrt {\rm{j}} \,\sum\limits_{{\rm{i}} = 1}^{\rm{N}} {({{\rm{N}}_{\rm{i}}},{{\rm{D}}_{\rm{i}}})} \) is the multivariate normal distribution with mean vector

$$\sqrt{\mathrm{j}}\left(\sum_{\mathrm{i}=1}^{\mathrm{N}} \mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}\mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}}), \sum\limits_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})\right)$$

and variance-covariance matrix \({{\rm{M}}^{ - 1}}\sum\limits_{{\rm{i}} = 1}^{\rm{N}} {{{\rm{V}}^{\rm{i}}}} \), where the Vi is defined in (15)(18). By the delta method, we have

$$\sqrt {\rm{j}} \,({\mathbb{\widehat{S}}_{\rm{j}}} - {\mu _{\rm{j}}})\;\mathop \to \limits^d \;{\rm{N(0,\mathbb{V}}}{\rm{)}}\,\,\,\,{\rm{,}}$$
((A2.2))

where \(\mu_{\mathrm{j}}=\sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}\) and where \(\mathbb{V}\) is given in equation (19). Using a Taylor series argument as in the proof of Theorem 1, we get for each i

$$\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{i}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}=\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}})+\mathrm{O}_{\mathrm{p}}(\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{i}}(\theta_{\mathrm{j}^{\prime}}-\widehat{\theta})^{2});$$

hence,

$$\sqrt{\mathrm{j}}\ \left\{\mu_{\mathrm{j}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}(\{ \mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\right\}=o_{\mathrm{p}}(1).$$
((A2.3))

Combining (A2.1), (A2.2) and (A2.3) we find where \(\sqrt {\rm{j}} \;({\theta _{{\rm{j}} + 1}} - \widehat\theta )\;\;\mathop \to \limits^d \;\;{\rm{N}}(0,\Sigma )\),

$$\Sigma = {\mathbb{H}^{ - 1}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert\widehat\theta ) \cdot \mathbb{V}\cdot {\mathbb{H}^{ - {\rm{T}}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert\widehat\theta ).$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satten, G.A., Datta, S. The S-U algorithm for missing data problems. Computational Statistics 15, 243–277 (2000). https://doi.org/10.1007/s001800000031

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s001800000031

Keywords

Navigation