The S-U algorithm for missing data problems

Satten, Glen A.; Datta, Somnath

doi:10.1007/s001800000031

The S-U algorithm for missing data problems

Published: 11 September 2000

Volume 15, pages 243–277, (2000)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Glen A. Satten¹ &
Somnath Datta²

341 Accesses
9 Citations
Explore all metrics

Summary

We present a new Monte-Carlo method for finding the solution of an estimating equation that can be expressed as the expected value of a ‘full data’ estimating equation in which the expected value is with respect to the distribution of the missing data given the observed data. Equations such as these arise whenever the E-M algorithm can be used. The algorithm alternates between two steps: an S-step, in which the missing data are simulated, either from the conditional distribution described above or from a more convenient importance sampling distribution, and a U-step, in which parameters are updated using a closed-form expression that does not require a numerical maximization. We present two numerical examples to illustrate the method. Theoretical results are obtained establishing consistency and asymptotic normality of the approximate solution obtained by our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 3

Evaluating methods for handling missing ordinal data in structural equation modeling

Article 25 January 2019

An EM algorithm for fitting matrix-variate normal distributions on interval-censored and missing data

Article Open access 28 January 2025

Efficient estimation of multiple expectations with the same sample by adaptive importance sampling and control variates

Article 15 July 2023

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Andrews, D. W. K. (1992) Generic uniform convergence. Econometric Theory 8, 241–257.
Article MathSciNet Google Scholar
Celeux, G. and Diebold, D. (1985) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Stat. Quarterly, 2, 73–82.
Google Scholar
Crouch, E. A. C., and Spiegelman, D. (1990) The Evaluation of Integrals of the Form $\int_{ - \infty }^{ + \infty }$ f(t)exp(−t²) dt: Application to Logistic-Normal Models, J. Amer. Statist. Assoc., 85, 464–469.
MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Statist. Soc. B, 39, 1–37.
MATH Google Scholar
Diebold, J. and Ip, E. H. S. (1996) Stochastic EM: method and application. In Markov Chain Monte Carlo in Practice (ed.s W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), pp. 259–273. London: Chapman & Hall.
Google Scholar
Diggle, P. J., Liang, K.-Y., and Zeger, S. L. (1994) Analysis of Longitudinal Data, Oxford, UK: Oxford University Press.
MATH Google Scholar
Gelfand, A. E. and Carlin, B. P. (1993) Maximum likelihood estimation for constrained or missing data models, Canadian J. Statist. 21, 303–311.
Article MathSciNet Google Scholar
Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23 Symposium on the Interface (ed. E. M. Keramidas), pp. 156–163. Fairfax: Interface Foundation.
Google Scholar
Geyer, C. J. and Thompson, E. A. (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J. R. Statist. Soc. B, 54, 657–699.
Google Scholar
Geyer, C. J. (1995) Estimation and Optimization of Functions. In Markov Chain Monte Carlo in Practice (eds. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), pp. 241–258. London: Chapman & Hall.
Google Scholar
Hyde, C. C., and Hall, P. (1980) Martingale Limit Theory and its Applications, New York, NY: Academic Press.
Google Scholar
Jones, B., and Kenward, M. G. (1989) Design and Analysis of Cross-Over Studies, London: Chapman & Hall.
MATH Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, B. T., and Flannery, B. P. (1992) Numerical Recipes in Fortran: The Art of Scientific Computing (2nd Edition), Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Ruppert, D., Reish, R. L., Deriso, R. B., and Carroll, R. J. (1984) Optimization using Stochastic Approximation and Monte Carlo Simulation (with Application to Harvesting of Atlantic Menhaden). Biometrics, 40, 535–545.
Article Google Scholar
Ruppert, D. (1991), Stochastic Approximation. In Handbook of Sequential Analysis (ed.s B. K. Ghosh and P. K. Sen), pp. 503–29. New York: Marcel Dekker.
Google Scholar
Satten, G. A. (1996) Rank-based inference in the proportional hazards model for interval censored data. Biometrika, 83, 355–370.
Article Google Scholar
Satten, G. A., Datta, S., and Williamson, J., (1998) Inference based on imputed failure times for the proportional hazards model with interval censored data. J. Amer. Statist. Assoc. 93, 318–327.
Article MathSciNet Google Scholar
Sternberg, M. R. and Satten, G. A. (1999). Discrete-time nonparametric estimation for semi-Markov models of chain-of-events data. Biometrics 55, 514–522.
Article Google Scholar
Tanner, M. A. (1993) Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. New York: Springer-Verlag.
Book Google Scholar
Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J. Amer. Statist. Assoc. 85, 699–704.
Article Google Scholar
Wu, C. F. J. (1985) Efficient sequential designs with binary data. J. Amer. Statist. Assoc. 80, 974–984.
Article MathSciNet Google Scholar
Wu, C. F. J. (1986) Maximum likelihood recursion and stochastic approximation in sequential designs. In Adaptive Statistical Procedures and Related Topics (ed. J. Van Ryzin), pp. 298–313. Hayward, CA: Institute of Mathematical Statistics.
Chapter Google Scholar
Younes, L. (1988) Estimation and annealing for gibbsian fields. Ann. de l’Inst. Henri Poincare. Sect. B, Prob. et Statist., 24, 269–294.
MathSciNet MATH Google Scholar
Zeger, S. L., and Karim, M. R., (1991) Generalized linear models with random effects; A Gibbs sampling approach. J. Amer. Statist. Assoc. 86, 79–86.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

MS E-48, National Center for HIV, STD and TB Prevention, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA, 30333, USA
Glen A. Satten
Department of Statistics, University of Georgia, Athens, GA, 30602, USA
Somnath Datta

Authors

Glen A. Satten
View author publications
Search author on:PubMed Google Scholar
Somnath Datta
View author publications
Search author on:PubMed Google Scholar

Appendix

Regularity assumptions for Theorem 1.

Let $\Omega = [\hat \theta - \eta ,\hat \theta + \eta ]$, η > 0 be a compact neighborhood of ${\hat \theta }$, let log⁺(x) = max[0, log(x)], x > 0, and let ∥ ∥ denote the Euclidian norm, and let ${\rm{G}}_\theta ^{ - 1{\rm{i}}}({\rm{u}}\vert{{\rm{y}}_{\rm{i}}}) = \mathop {\inf }\limits_{\rm{x}} \,\{ {\rm{G}}_\theta ^{\rm{i}}({\rm{x}}\vert{{\rm{y}}_{\rm{i}}}) \ge {\rm{u}}\} $. Then, the following conditions are assumed to hold for each 1 ≤ i ≤ N.

C1. There exist non-negative functions A_i and h_i with $\int\limits_0^1 {{{\rm{A}}_{\rm{i}}}({\rm{u}})} $ log⁺[A_i(u)] du < ∞, and h_i(x) ↓ 0 as x ↓ 0, such that

$$\begin{array}{c}{\left\Vert\mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]-\mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right\Vert} \\ { \leq \mathrm{A}_{\mathrm{i}}(\mathrm{u}) h_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)}\end{array}$$

∀ θ, θ′ ∈ Ω, u ∈ (0,1), and

$${E_{{{\rm{G}}_\theta }({\rm{X}}\vert{{\rm{y}}_{\rm{i}}})}}[{{\rm{w}}_\theta }({\rm{X}},{{\rm{y}}_{\rm{i}}}){\log ^ + }{{\rm{w}}_\theta }({\rm{X}},{{\rm{y}}_{\rm{i}}})] < \infty $$

C2a. There exist non-negative functions B_i and k_i with $\int\limits_0^1 {{{\rm{B}}_{\rm{i}}}({\rm{u}})} $ log⁺[B_i(u)] du < ∞, and k_i(x) ↓ 0 as x ↓ 0, such that

$$\begin{array}{l}{\left\Vert \mathrm{S}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta}^{\mathrm{i}}[G_{\theta}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right.} \\ {\quad-\mathrm{S}_{\mathrm{i}}[G_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert\mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\left\Vert \leq \mathrm{B}_{\mathrm{i}}(\mathrm{u})k_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)\right.}\end{array}$$

∀θ, θ′ ∈ Ω, u ∈ (0,1), and

$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{X} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\mathrm{S}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\mathrm{S}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}\left(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\}]<\infty$$

C2b. There exist non-negative functions ${\widetilde{\rm{B}}_{\rm{i}}}$ and ${\widetilde{k}_{\rm{i}}}$ with $\int\limits_0^1 {{{\widetilde{\rm{B}}}_{\rm{i}}}({\rm{u}}){{\log }^ + }[{{\widetilde{\rm{B}}}_{\rm{i}}}({\rm{u}})]\,{\rm{du}} < \infty } $, and ${\widetilde{k}_{\rm{i}}}({\rm{x}}) \downarrow 0$ as x ↓ 0, such that

$$\begin{array}{l}{ \left\Vert \tilde{\mathrm{S}}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right.} \\ {\ \ \left.-\tilde{\mathrm{S}}_{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}] \mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1 \mathrm{i}}(\mathrm{u} \vert \mathrm{y}_{\mathrm{i}}), \mathrm{y}_{\mathrm{i}}]\right\Vert \leq \tilde{\mathrm{B}}_{\mathrm{i}}(\mathrm{u}) \tilde{k}_{\mathrm{i}}(\left\Vert\theta-\theta^{\prime}\right\Vert)}\end{array}$$

$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{X} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\tilde{\mathrm{S}}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\tilde{\mathrm{S}}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\}]<\infty$$

C3. There exist non-negative functions C_i and l_i with $\int\limits_0^1 {{{\rm{C}}_{\rm{i}}}({\rm{u}})} $ log⁺[C_i(u)] du < ∞, and l_i(x) ↓ 0 as x ↓ 0, such that

$$\begin{array}{l}\left\Vert \mathcal{H}_{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\mathrm{w}_{\theta}^{\mathrm{i}}[\mathrm{G}_{\theta}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\right.\\ \left.-\mathcal{H}_{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\mathrm{w}_{\theta^{\prime}}^{\mathrm{i}}[\mathrm{G}_{\theta^{\prime}}^{-1\mathrm{i}}(\mathrm{u}\vert \mathrm{y}_{\mathrm{i}}),\ \mathrm{y}_{\mathrm{i}}]\right\Vert \leq \mathrm{C}_{\mathrm{i}}(\mathrm{u})l_{\mathrm{i}}(\Vert \theta-\theta^{\prime}\Vert)\end{array}$$

∀θ,θ′ ∈ Ω, u ∈ (0,1), and

$$E_{\mathrm{G}_{\theta}^{\mathrm{i}}(\mathrm{x} \vert \mathrm{y}_{\mathrm{i}})}[\left\Vert\mathcal{H}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}}) \log ^{+}\{\left\Vert\mathcal{H}_{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\right\Vert \mathrm{w}_{\theta}^{\mathrm{i}}(\mathrm{X}, \mathrm{y}_{\mathrm{i}})\}]<\infty$$

C4. ${\rm{f}}_\theta ^{\rm{i}}({{\rm{y}}_{\rm{i}}}),\;\;\;\;\;\;\;{\mathbb{S}_{\rm{i}}}({{\rm{y}}_{\rm{i}}}\vert\theta )$ and ℍ_i(y_i ∣ θ) are twice continuously differentiable in $\theta ,\;\;\;\;\;{\rm{f}}_\theta ^{\rm{i}}({{\rm{y}}_{\rm{i}}})$ is positive, and ℍ_T({y_i} ∣ θ) is non-singular on Ω.

Conditions (C1) − (C3) are needed for the uniform SLLN corresponding to the summands w_ijk, S_ijk w_ijk, ${\widetilde{\rm{S}}_{{\rm{ijk}}}}{{\rm{w}}_{{\rm{ijk}}}}$ and ${{\cal H}_{{\rm{ijk}}}}{{\rm{w}}_{{\rm{jk}}}}$ respectively. If ${\rm{G}}_\theta ^{\rm{i}}( \cdot \vert{\rm{y}})$ does not depend on θ, these conditions take a simpler form.

Condition (C2b) is only required if ${\widetilde{\rm{S}}_{\rm{i}}}({\rm{x}},{\rm{y}}\vert\theta ) \ne {{\rm{S}}_{\rm{i}}}({\rm{x}},{\rm{y}}\vert\theta )$.

Proof of Theorem 1. For simplicity of presentation, we assume that θ is a real parameter and N = 1. In that case, can omit the subscript i throughout and denote ${\mathbb{S}_{\rm{T}}} = {\mathbb{S}_{\rm{1}}}$ by $\mathbb{S}$ etc., throughout the proof. K will stand for a generic constant.

Note that we can write ${{\rm{w}}_{{\rm{ijk}}}} = {\rm{w}}_{{\theta _{\rm{j}}}}^{\rm{i}}[{\rm{G}}_{{\theta _{\rm{j}}}}^{{\rm{ - 1i}}}({{\rm{U}}_{{\rm{jk}}}},{\rm{y}}),\;{\rm{y}}]$, where U_ijk are i.i.d. U[0,1]. It is easy to see that by (C1), the conditions BD, P-SLLN and S-LIP in the uniform SLLN of Andrews (1992) are satisfied; here, uniformity refers to uniform in θ ∈ Ω. Hence, by Theorem 3 of the same paper,

$${1 \over {\rm{M}}}\sum\limits_{{\rm{k}} = 1}^{\rm{M}} {{{\rm{w}}_{{\rm{ijk}}}}} \;\;\mathop \to \limits^{{\rm{a}}.{\rm{s}}{\rm{.}}} \;\;\phi _{{\theta _{\rm{j}}}}^{\rm{i}}({\rm{y}})\;\;\;,$$

uniformly in j ≥ 1, as M → ∞, provided the θ_j′s lie in Ω; $\phi _\theta ^{\rm{i}}( \cdot )$ is defined in (12). In the same way, by (C2) and (C3),

$$\begin{array}{l}{\frac{1}{\mathrm{M}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{S}_{\text{ik}}\mathrm{w}_{\text{jk}}\ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow}\ \ \phi_{\theta_{\mathrm{i}}}(\mathrm{y}) \mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{i}})}, \\ {\frac{1}{\mathrm{M}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \tilde{\mathrm{S}}_{\text{ik}} \mathrm{w}_{\text{jk}}\ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow} \ \ \phi_{\theta_{\mathrm{i}}}(\mathrm{y}) \tilde{\mathbb{S}}\left(\mathrm{y} \vert \theta_{\mathrm{i}}\right)},\end{array}$$

and

$$\frac{1}{\mathrm{M}} \sum_{\mathrm{k}=1}^{\mathrm{M}} \mathcal{H}_{\text{jk}}\ \mathrm{w}_{\text{jk}}\ \ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow}\ \ \ \phi_{\theta_{\mathrm{j}}}(\mathrm{y})\left[\mathbb{H}\left(\mathrm{y} \vert \theta_{\mathrm{j}}\right)-\mathbb{S}\left(\mathrm{y} \vert \theta_{\mathrm{j}}\right) \tilde{\mathbb{S}}^{\mathrm{T}}(\mathrm{y} \vert \theta_{\mathrm{j}})\right],$$

uniformly in j ≥ 1, as M → ∞, provided the θ_j′s lie in Ω. Therefore

$$\mathrm{r}_{\mathrm{i}} \equiv\left\vert\frac{\widehat{\mathbb{S}}_{\mathrm{j}}}{\widehat{\mathbb{H}}_{\mathrm{j}}}-\frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{S}\left(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}\right)}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{H}\left(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}\right)}\right\vert \ \ \ \overset{\mathrm{a}.\mathrm{s}.}{\rightarrow} 0,$$

((A1.1))

uniformly in j ≥ 1, as M → ∞, provided the θ_j′s lie in Ω.

Next, note that by Taylor expanding both the numerator and the denominator of the second ratio in r_j we get

$$\left\vert\frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y}) \mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathrm{j}^{\prime}\sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}(\mathrm{y})\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-\frac{\mathbb{S}(\mathrm{y} \vert \overline{\theta})}{\mathbb{H}(\mathrm{y} \vert \overline{\theta})}\right\vert \leq \text{Kj}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\overline{\theta})^{2}.$$

((A1.2))

Similarly,

$$\left\vert \mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-\frac{\mathbb{S}(\mathrm{y} \vert \overline{\theta})}{\mathbb{H}(\mathrm{y} \vert \overline{\theta})}\right\vert \leq \mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\overline{\theta})^{2}.$$

((A1.3))

Combining (A1.1), (A1.2) and (A1.3) we get

$$\begin{array}{lll}\left\vert\frac{\widehat{\mathbb{S}}_{\mathrm{j}}}{\widehat{\mathbb{H}}_{\mathrm{j}}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}\right\vert & \leq & \mathrm{r}_{\mathrm{j}}+\mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_\mathrm{{j}^{\prime}}-\overline{\theta})^{2} \\ & \leq & \mathrm{r}_{\mathrm{j}}+\mathrm{K}\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\theta_{\mathrm{j}^{\prime}}-\widehat{\theta})^{2}, \end{array}$$

((A1.4))

provided the θ_j′s lie in Ω.

By a Taylor expansion of $({\rm{y}}\vert\widehat\theta )$ around θ_j we get

$$0=\mathbb{S}(\mathrm{y} \vert \widehat{\theta})=\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}})+(\widehat{\theta}-\theta_{\mathrm{j}}) \mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}})+\frac{1}{2}(\widehat{\theta}-\theta_{\mathrm{j}})^{2} \mathbb{S}^{\prime \prime}(\mathrm{y} \vert \theta_{\mathrm{i}}^{\ast}),$$

((A1.5))

where $\theta _{\rm{j}}^ {\ast} $ lies between θ_j and $\widehat\theta $. Dividing (A1.5) by ℍ(y ∣ θ_j) and averaging, we obtain

$$\widehat{\theta}=\overline{\theta}_{\mathrm{j}}-\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}{ \mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}-(2 \mathrm{j})^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}(\widehat{\theta}-\theta_{\mathrm{j}^{\prime}}) \frac{\mathbb{S}^{\prime \prime}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}}^{\ast})}{\mathbb{H}(\mathrm{y} \vert \theta_{\mathrm{j}^{\prime}})}.$$

((A1.6))

Using ${\theta _{{\text{j + 1}}}} = {\bar \theta _\text{j}}/{ \mathbb{\hat{H}}_\text{j}}$, (A1.4) and (A1.6) we obtain

$$\vert\widehat\theta - {\theta _{{\rm{j}} + 1}}\vert\; \le \;{{\rm{r}}_{\rm{j}}} + {\rm{K}}\;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {({\theta _{{\rm{j}}\prime }} - \widehat\theta )^2},$$

provided the θ_j′s lie in Ω.

Let P > max(2K, η⁻¹). Then re-write the above inequality as

$${{\rm{a}}_{{\rm{j}} + 1}}\; \le \;{\epsilon_{\rm{j}}} + {{\rm{K}} \over {\rm{P}}}\;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{a}}_{{\rm{j}}\prime }^2\;,$$

((A1.7))

where ${{\rm{a}}_{\rm{j}}} = {\rm{P}}\vert\widehat\theta - {\theta _{\rm{j}}}\vert$ and ∊_j = r_jP, j ≥ 1. By (A1.1), on a set of probability one, select M large enough so that ${\epsilon_{\rm{j}}} < {1 \over 2},{\kern 1pt} \forall {\rm{j}} \ge {\rm{1}}$, provided the θ_j′s lie in Ω. Suppose that the starting value ${\theta _1} \in \Omega = [\widehat\theta - \eta ,\widehat\theta + \eta ]$. Then a₁ < 1 and using (A1.7) we find that ${{\rm{a}}_2} \le ({1 \over 2} + {{\rm{K}} \over {\rm{P}}}) < 1$ which implies that θ₂ ∈ Ω. Inductively using (A1.7) we find that a_j ∈ Ω and $0 \le {{\rm{a}}_{\rm{j}}} \le ({1 \over 2} + {{\rm{K}} \over {\rm{P}}})$, ∀j ≥ 1. Therefore, applying “limsup” to both sides of (A1.7) and using the fact that for a_j ≥ 0,

$$\mathop {\lim \sup }\limits_{{\rm{j}}\; \to \;\infty } \;{{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{a}}_{{\rm{j}}\prime }^2 \le {(\mathop {\lim \sup }\limits_{{\rm{j}}\; \to \;\infty } \;{{\rm{a}}_{\rm{j}}})^2}\;,$$

and that ${{\rm{r}}_{\rm{j}}}\mathop \to \limits^{{\rm{a}}.{\rm{s}}.} 0$ as j → ∞ (for any given replication size M), we conclude that on a set of probability one,

$$\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} \le {1 \over 2}{(\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}})^2},$$

which immediately implies $\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} = 0$, since $\lim \sup\limits_{{\rm{j}} \to \infty } \,{{\rm{a}}_{\rm{j}}} \in [0,1)$. Hence, convergence of θ_j to $\widehat\theta $ is established. For the case N > 1 note that while equations (A1.1) — (A1.3) become considerably more complicated, equation (A1.4) remains unchanged. □

Regularity assumptions for Theorem 2: Assume the conditions for Theorem 1 plus

N1. The matrix $\sum\limits_{\rm{i}} {{{\rm{V}}^{\rm{i}}}} $ is positive definite.

Proof of Theorem 2. Write

$$\sqrt {\rm{j}} ({\theta _{{\rm{j}} + 1}} - \widehat\theta ) = \sqrt {\rm{j}} \,({\overline \theta _{\rm{j}}} - \widehat\theta - \mathbb{\widehat{H}}_{\rm{j}}^{ - 1} {\mathbb{\widehat{S}}_{\rm{j}}})$$

$$\begin{array}{l}{=\widehat{\mathbb{H}}_{\mathrm{j}}^{-1}\{\mathbb{H}_{\mathrm{T}}(\{ \mathrm{y}_{\mathrm{i}}\} \vert \widehat{\theta})-\widehat{\mathbb{H}}_{\mathrm{j}}\} \sqrt{\mathrm{j}}(\overline{\theta}_{\mathrm{j}}-\widehat{\theta})} \\ {+\widehat{\mathbb{H}}_{\mathrm{j}}^{-1}[\sqrt{\mathrm{j}}\{\mathbb{H}_{\mathrm{T}}(\{\mathrm{y}_{\mathrm{i}}\} \vert \widehat{\theta})(\overline{\theta}_{\mathrm{j}}-\widehat{\theta})-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{i}} \mathbb{S}_{\mathrm{T}}( \{\mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\}]} \\ {-\widehat{\mathbb{H}}_{\mathrm{j}}^{-1} \sqrt{\mathrm{j}} \{\widehat{\mathbb{S}}_{\mathrm{j}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{T}}(\{\mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\}}.\end{array}$$

((A2.1))

Since ${\mathbb{\widehat{H}}_{\rm{j}}}\mathop \to \limits^P {\mathbb{H}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert{\widehat\theta })$ and $\sqrt {\rm{j}} \,({\overline \theta _{\rm{j}}} - \widehat\theta )$ is O_p(1), the first term of (A2.1) converges to zero in probability.

Taylor expanding ${\mathbb{S}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert{\theta _{\rm{j}}})$ around $\widehat\theta $ and averaging, we obtain

$${{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\mathbb{S}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} {\rm{\vert}}{\theta _{{\rm{j}}\prime }}) = {\mathbb{H}_{\rm{T}}}(\{ {{\rm{y}}_{\rm{i}}}\} {\rm{\vert}}\hat \theta )\;({\bar \theta _{\rm{j}}} - \hat \theta ) + {{\rm{O}}_{\rm{p}}}({{\rm{j}}^{ - 1}}\sum\limits_{{\rm{j}}\prime = 1}^{\rm{j}} {} {\rm{\vert}}{\theta _{{\rm{j}}\prime }} - \hat \theta {{\rm{\vert}}^2})\;;$$

therefore, the second term on the right hand side of (A2.1) is o_p(1), since ${{\rm{j}}^{1/2}}\sum\limits_{{\rm{j}}\prime = 1}^j {{\rm{|}}{\theta _{{\rm{j}}\prime }} - \widehat\theta {\rm{|}} = {{\rm{O}}_{\rm{p}}}} $

From (5), (6) and (8), we have

$$\widehat{\mathbb{S}}_{\mathrm{j}} \equiv \sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\frac{1}{\mathrm{j}\cdot\mathrm{M}} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{S}_{\mathrm{ij}^{\prime}\mathrm{k}} \mathrm{w}_{\mathrm{ij}^{\prime}\mathrm{k}}}{\frac{1}{\mathrm{j}\cdot\mathrm{M}} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \sum\limits_{\mathrm{k}=1}^{\mathrm{M}} \mathrm{w}_{\mathrm{ij}^{\prime}\mathrm{k}}} \equiv \sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\mathrm{N}_{\mathrm{i}}}{\mathrm{D}_{\mathrm{i}}}.$$

It is easy to verify the conditional Lindeberg condition for linear combinations of (N_i, D_i). By the martingale central limit theorem (Corollary 3.1 of Hyde and Hall, 1980), the asymptotic distribution of $\sqrt {\rm{j}} \,\sum\limits_{{\rm{i}} = 1}^{\rm{N}} {({{\rm{N}}_{\rm{i}}},{{\rm{D}}_{\rm{i}}})} $ is the multivariate normal distribution with mean vector

$$\sqrt{\mathrm{j}}\left(\sum_{\mathrm{i}=1}^{\mathrm{N}} \mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}}\mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}}), \sum\limits_{\mathrm{i}=1}^{\mathrm{N}}\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})\right)$$

and variance-covariance matrix ${{\rm{M}}^{ - 1}}\sum\limits_{{\rm{i}} = 1}^{\rm{N}} {{{\rm{V}}^{\rm{i}}}} $, where the Vⁱ is defined in (15)–(18). By the delta method, we have

$$\sqrt {\rm{j}} \,({\mathbb{\widehat{S}}_{\rm{j}}} - {\mu _{\rm{j}}})\;\mathop \to \limits^d \;{\rm{N(0,\mathbb{V}}}{\rm{)}}\,\,\,\,{\rm{,}}$$

((A2.2))

where $\mu_{\mathrm{j}}=\sum_{\mathrm{i}=1}^{\mathrm{N}} \frac{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}$ and where $\mathbb{V}$ is given in equation (19). Using a Taylor series argument as in the proof of Theorem 1, we get for each i

$$\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \frac{\mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}}) \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}{\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{i}} \phi_{\theta_{\mathrm{j}^{\prime}}}^{\mathrm{i}}(\mathrm{y}_{\mathrm{i}})}=\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}_{\mathrm{i}}(\mathrm{y}_{\mathrm{i}} \vert \theta_{\mathrm{j}^{\prime}})+\mathrm{O}_{\mathrm{p}}(\mathrm{j}^{-1} \sum_{\mathrm{j}^{\prime}=1}^{\mathrm{i}}(\theta_{\mathrm{j}^{\prime}}-\widehat{\theta})^{2});$$

hence,

$$\sqrt{\mathrm{j}}\ \left\{\mu_{\mathrm{j}}-\mathrm{j}^{-1} \sum\limits_{\mathrm{j}^{\prime}=1}^{\mathrm{j}} \mathbb{S}(\{ \mathrm{y}_{\mathrm{i}}\} \vert \theta_{\mathrm{j}^{\prime}})\right\}=o_{\mathrm{p}}(1).$$

((A2.3))

Combining (A2.1), (A2.2) and (A2.3) we find where $\sqrt {\rm{j}} \;({\theta _{{\rm{j}} + 1}} - \widehat\theta )\;\;\mathop \to \limits^d \;\;{\rm{N}}(0,\Sigma )$,

$$\Sigma = {\mathbb{H}^{ - 1}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert\widehat\theta ) \cdot \mathbb{V}\cdot {\mathbb{H}^{ - {\rm{T}}}}(\{ {{\rm{y}}_{\rm{i}}}\} \vert\widehat\theta ).$$

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satten, G.A., Datta, S. The S-U algorithm for missing data problems. Computational Statistics 15, 243–277 (2000). https://doi.org/10.1007/s001800000031

Download citation

Published: 11 September 2000
Issue Date: July 2000
DOI: https://doi.org/10.1007/s001800000031

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The S-U algorithm for missing data problems

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating methods for handling missing ordinal data in structural equation modeling

An EM algorithm for fitting matrix-variate normal distributions on interval-censored and missing data

Efficient estimation of multiple expectations with the same sample by adaptive importance sampling and control variates

Explore related subjects

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now