Skip to main content
Log in

Optimal imputation of the missing data using multi auxiliary information

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This article deals with some new imputation methods by extending the work of Bhushan and Pandey using multi-auxiliary information. The popularly used imputation like mean imputation, ratio method of imputation, regression method of imputation and power transformation method are special cases of the proposed methods apart from being less efficient than the proposed methods. The proposed imputation methods can be considered as an efficient extension to the work of Singh and Deo (Stat Pap 44:555–579, 2003), Singh (Stat A J Theor Appl Stat 43(5):499–511, 2009), Ahmed et al. (Stat Transit 7(6):1247–1264, 2006), Diana and Perri (Commun Stat Theory Methods 39:3245–3251, 2010) and Bhushan and Pandey (J Stat Manag Syst 19(6):755–769, 2016, Commun Stat Theory Methods 47(11):2576–2589, 2018). The theoretical results are derived and comparative study is conducted using real and simulated data and the results are found to be quite encouraging providing the improvement over the all discuss work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmed MS, Al-Titi O, Al-Rawi Z, Abu-Dayyeh W (2006) Estimation of a population mean using different imputation methods. Stat Transit 7(6):1247–1264

    Google Scholar 

  • Bhushan S, Pandey AP (2016) Optimal imputation of missing data for estimation of population mean. J Stat Manag Syst 19(6):755–769

    Google Scholar 

  • Bhushan S, Pandey AP (2018) Optimality of ratio type estimation methods for population mean in presence of missing data. Commun Stat Theory Methods 47(11):2576–2589

    Article  MathSciNet  Google Scholar 

  • Bhushan S, Pandey AP, Pandey A (2018) On optimality of imputation methods for estimation of population mean using higher order moment of an auxiliary variable. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2018.1500595

    Article  Google Scholar 

  • Diana G, Perri PF (2010) Improved estimators of the population mean for missing data. Commun Stat Theory Methods 39:3245–3251

    Article  MathSciNet  Google Scholar 

  • El-Badry MA (1956) A sampling procedure for mailed questionnaires. J Am Stat Assoc 51:209–227

    Article  Google Scholar 

  • Hansen MH, Hurwitz WN (1946) The problem of non-response in sample surveys. J Am Stat Assoc 41:517–529

    Article  Google Scholar 

  • Heitjan DF, Basu S (1996) Distinguishing ‘missing at random’ and ‘missing completely at random’. Am Stat 50:207–213

    MathSciNet  Google Scholar 

  • Kalton G, Kasprzyk D (1982) Imputing for missing survey responses. In: Proceedings of the section on survey research method. American Statistical Association, pp 22–31

  • Kalton G, Kasprzyk D, Santos R (1981) Issues of nonresponse and imputation in the survey of income and program participation. In: Krewski D, Platek R, Rao JNK (eds) Current topics in survey sampling. Academic Press, New York, pp 455–480

    Chapter  Google Scholar 

  • Kadilar C, Cingi H (2008) Estimators for the population mean in the case missing data. Commun Stat Theory Methods 37:2226–2236

    Article  MathSciNet  Google Scholar 

  • Lee H, Rancourt E, Sarndal CE (1994) Experiments with variance estimation from survey data with imputed values. J Off Stat 10:231–243

    Google Scholar 

  • Lee H, Rancourt E, Sarndal CE (1995) Variance estimation in the presence of imputed data for the generalized estimation system. In: Proceedings of the section on survey research methods. American Statistical Association

  • Olkin I (1958) Multi-variate ratio estimation for finite population. Biometrika 43:154–163

    Article  Google Scholar 

  • Prasad S (2017) A study on new methods of ratio exponential type imputation in sample surveys. Hacettepe J Math Stat. https://doi.org/10.15672/HJMS.2016.392

    Article  Google Scholar 

  • Rubin RB (1976) Inference and missing data. Biometrika 63(3):581–592

    Article  MathSciNet  Google Scholar 

  • Searls DT (1964) The utilization of a known coefficient of variation in the estimation procedure. J Am Stat Assoc 59:1225–1226

    Article  Google Scholar 

  • Srinath KP (1971) Multiphase sampling in non-response problems. J Am Stat Assoc 66:583–586

    Article  Google Scholar 

  • Singh MP (1969) Some aspects of estimation in sampling from finite population. PhD thesis, Indian Statistical Institute, Calcutta, India

  • Singh S (2003) Advanced sampling theory with applications. How Michael selected Amy, vol 1 & 2. Kluwer, Dordrecht

    Book  Google Scholar 

  • Singh S (2009) A new method of imputation in survey sampling. Stat A J Theor Appl Stat 43(5):499–511

    MathSciNet  MATH  Google Scholar 

  • Singh S, Deo B (2003) Imputation by power transformation. Stat Pap 44:555–579

    Article  MathSciNet  Google Scholar 

  • Singh S, Horn S (2000) Compromised imputation in survey sampling. Metrika 51:267–276

    Article  MathSciNet  Google Scholar 

  • Singh GN, Suman S (2019) Estimation of population mean using imputation methods for missing data under two-phase sampling design. J Stat Theory Pract. https://doi.org/10.1007/s42519-018-0016-5

    Article  MathSciNet  MATH  Google Scholar 

  • Singh GN, Pandey AK, Sharma AK (2020) Some improved and alternative imputation methods for finite population mean in presence of missing information. Commun Stat Theory Methods. https://doi.org/10.1080/03610926.2020.1713375

    Article  Google Scholar 

  • Srivastava SK (1967) An estimator using auxiliary information in sample surveys. Calcutta Stat Assoc Bull 16:121–132

    Article  MathSciNet  Google Scholar 

  • Walsh JE (1970) Generalization of ratio estimator for population total. Sankhya A 32:99–106

    MATH  Google Scholar 

Download references

Acknowledgements

The authors are deeply grateful to the hon’ble referees and to the editors for the rigorous review and comments which significantly improved the revised manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhay Pratap Pandey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Outline of Derivation of Theorem 3.1. The MSE of \(T_{k}\) and \( T_{k_{mult}}\), \(k=1,2,3\), is given by

$$\begin{aligned} \min MSE\left( T_{k}\right)= & {} \dfrac{{\bar{Y}}^{2}MSE\left( t_{k}\right) }{ {\bar{Y}}^{2}+MSE\left( t_{k}\right) }\hbox { and }\min MSE\left( T_{k_{mult}}\right) =\dfrac{{\bar{Y}}^{2}MSE\left( t_{k_{mult}}\right) }{ {\bar{Y}}^{2}+MSE\left( t_{k_{mult}}\right) }\\ \gamma _{1}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1}{ r}-\frac{1}{n}\right) +\left( \frac{1}{n}-\frac{1}{N}\right) \left( 1-R_{y.zx}^{2}\right) \right\} }, d_{1}=\gamma _{1}\frac{S_{y}\left( \rho _{yx}-\rho _{yz}\rho _{zx}\right) }{S_{x}\left( 1-\rho _{zx}^{2}\right) },\\ d_{2}= & {} \gamma _{1}\frac{S_{y}\left( \rho _{yz}-\rho _{yx}\rho _{zx}\right) }{ S_{z}\left( 1-\rho _{zx}^{2}\right) }\\ \gamma _{2}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1}{r}-\frac{1}{N}\right) \left( 1-R_{y.zx}^{2}\right) \right\} }, d_{3}=\gamma _{7}\frac{S_{y}\left( \rho _{yx}-\rho _{yz}\rho _{zx}\right) }{ S_{x}\left( 1-\rho _{zx}^{2}\right) }, \\ d_{4}= & {} \gamma _{7}\frac{S_{y}\left( \rho _{yz}-\rho _{yx}\rho _{zx}\right) }{S_{z}\left( 1-\rho _{zx}^{2}\right) } \\ \gamma _{3}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1}{ n}-\frac{1}{N}\right) +\left( \frac{1}{r}-\frac{1}{n}\right) \left( 1-R_{y.zx}^{2}\right) \right\} }, d_{5}=\gamma _{13}\frac{S_{y}\left( \rho _{yx}-\rho _{yz}\rho _{zx}\right) }{S_{x}\left( 1-\rho _{zx}^{2}\right) }\, , \\ d_{6}= & {} \gamma _{13}\frac{S_{y}\left( \rho _{yz}-\rho _{yx}\rho _{zx}\right) }{S_{z}\left( 1-\rho _{zx}^{2}\right) }\\ \gamma _{10}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1}{r}-\frac{1}{n}\right) +\left( \frac{1}{n}-\frac{1}{N}\right) \left( 1-R_{y.x_{1},x_{2},\ldots ,x_{p}}^{2}\right) \right\} }, \\ d_{j}= & {} \gamma _{4}\frac{S_{y}\left( \rho _{{yx}_j}-\rho _{yx_{i}}\rho _{x_{i}x_{j}}\right) }{ S_{x_{j}}\left( 1-\rho _{x_{i}x_{j}}^{2}\right) } \ (i\ne j=1,2,\ldots ,p)\\ \gamma _{11}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1 }{r}-\frac{1}{N}\right) \left( 1-R_{y.x_{1},x_{2},\ldots ,x_{p}}^{2}\right) \right\} },\\ d_{j}= & {} \gamma _{10}\frac{S_{y}\left( \rho _{{yx}_j}-\rho _{yx_{i}}\rho _{x_{i}x_{j}}\right) }{S_{x_{j}}\left( 1-\rho _{x_{i}x_{j}}^{2}\right) } \ \ \ \ \ \ \ \ \ (i\ne j=1,2,\ldots ,p)\\ \gamma _{12}= & {} \frac{{\bar{Y}}^{2}}{{\bar{Y}}^{2}+S_{y}^{2}\left\{ \left( \frac{1 }{n}-\frac{1}{N}\right) +\left( \frac{1}{r}-\frac{1}{n}\right) \left( 1-R_{y.x_{1},x_{2},\ldots ,x_{p}}^{2}\right) \right\} },\\ d_{j}= & {} \gamma _{16}\frac{ S_{y}\left( \rho _{{yx}_j}-\rho _{yx_{i}}\rho _{x_{i}x_{j}}\right) }{ S_{x_{j}}\left( 1-\rho _{x_{i}x_{j}}^{2}\right) }\quad (i\ne j=1,2,\ldots ,p) \end{aligned}$$

Outline of Derivation of Theorem 3.2. The MSE of \(T_{sr_{j}}\), \(j=1,2,\ldots ,6\), is given by

$$\begin{aligned} MSE\left( T_{4}\right)= & {} {\bar{Y}}^{2}\left[ 1+\gamma _{4}^{2}\left\{ 1+\left( \frac{1}{r}-\frac{1}{N}\right) C_{y}^{2}+\left( \frac{1}{n}-\frac{1}{N}\right) 2K_{1}^{2}C_{x}^{2}+2K_{2}^{2}C_{z}^{2}+K_{1}\left( C_{x}^{2}\right. \right. \right. \\&\left. \left. -4\rho _{yx}C_{y}C_{x}\right) +K_{2}\left( C_{z}^{2}-4\rho _{yz}C_{y}C_{z}\right) +4K_{1}K_{2}\rho _{zx}C_{z}C_{x}\right\} \\&-2\gamma _{4}\left\{ 1+\left( \frac{1}{n}-\frac{1}{N}\right) \right. \ \\&\left. \left. \left\{ \ \ \frac{K_{1}^{2}}{2}C_{x}^{2}+\dfrac{K_{2}^{2}}{2}C_{z}^{2}+\dfrac{K_{1}}{2}\left( C_{x}^{2}-2\rho _{yx}C_{y}C_{x}\right) +\dfrac{K_{2}}{2}\left( C_{z}^{2}-2\rho _{yz}C_{y}C_{z}\right) \right. \right. \right. \\&\left. \left. \left. +K_{1}K_{2}\rho _{zx}C_{z}C_{x}\right\} \right\} \right] \\ \end{aligned}$$

which can be expressed as

$$\begin{aligned} MSE\left( T_{4}\right) ={\bar{Y}}^{2}\left[ 1+\gamma _{4}^{2}A_{1}-2\gamma _{4}B_{1}\right] \end{aligned}$$

For optimum value of \(\gamma _{4}\) differentiating the \(MSE\left( T_{4}\right) \) with respect to and equating to zero we get

$$\begin{aligned} \gamma _{4opt}=\dfrac{B_{1}}{A_{1}} \end{aligned}$$

substituting the optimum value of \(\gamma _{4}\) in \(MSE\left( T_{4}\right) \) we get minimum MSE

$$\begin{aligned} MSE\left( T_{4}\right) ={\bar{Y}}^{2}\left( 1-\dfrac{B_{1}^{2}}{A_{1}} \right) \end{aligned}$$

The derivation of other estimators \(T_{j}\left( i=4,5,\ldots ,9\right) \) can be done on similar lines. In general, we have

$$\begin{aligned} MSE(T_{j})={\bar{Y}}^{2}\left[ 1+\gamma _{j}^{2}A_{j}-2\gamma _{j}B_{j} \right] \end{aligned}$$

The optimum values of scalars involved are tabulated below for ready reference:

$$\begin{aligned} \gamma _{jopt}= & {} \dfrac{B_{j}}{A_{j}}\ ,\ i=4,5,\ldots ,9.\\ A_{1}= & {} 1+f_{r}C_{y}^{2}+f_{n}\left\{ \begin{array}{c} 2K_{1}^{2}C_{x}^{2}+2K_{2}^{2}C_{z}^{2}+K_{1}\left( C_{x}^{2}-4\rho _{yx}C_{y}C_{x}\right) +K_{2}\left( C_{z}^{2}-4\rho _{yz}C_{y}C_{z}\right) \\ +4K_{1}K_{2}\rho _{zx}C_{z}C_{x} \end{array} \right\} \\ B_{1}= & {} 1+f_{n}\left\{ \dfrac{K_{1}^{2}}{2}C_{x}^{2}+\dfrac{K_{2}^{2}}{2} C_{z}^{2}+\dfrac{K_{1}}{2}\left( C_{x}^{2}-2\rho _{yx}C_{y}C_{x}\right) \right. \\&\left. + \dfrac{K_{2}}{2}\left( C_{z}^{2}-2\rho _{yz}C_{y}C_{z}\right) +K_{1}K_{2}\rho _{zx}C_{z}C_{x}\right\} \\ A_{2}= & {} 1+f_{r}C_{y}^{2}+f_{n}\left\{ 3\delta _{1}^{2}C_{x}^{2}+3\delta _{2}^{2}C_{z}^{2}-4\delta _{1}\rho _{yx}C_{y}C_{x}-4\delta _{2}\rho _{yz}C_{y}C_{z}+4\delta _{1}\delta _{2}\rho _{zx}C_{z}C_{x}\right\} \\ B_{2}= & {} 1+f_{n}\left\{ \delta _{1}^{2}C_{x}^{2}+\delta _{2}^{2}C_{z}^{2}-\delta _{1}\rho _{yx}C_{y}C_{x}-\delta _{2}\rho _{yz}C_{y}C_{z}+\delta _{1}\delta _{2}\rho _{zx}C_{z}C_{x}\right\} \\ A_{1mult}= & {} 1+f_{r}C_{y}^{2}+f_{n}\left\{ 2\sum \limits _{j=1}^{p}K_{j}^{2}C_{j}^{2}+\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-4\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +4\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ B_{1mult}= & {} 1+f_{n}\left\{ \dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}^{2}C_{x_{j}}^{2}+\dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-2\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ A_{2mult}= & {} 1+f_{r}C_{y}^{2}+f_{n}\left\{ 3\sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-4\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}+4\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ B_{2mult}= & {} 1+f_{n}\left\{ \sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}+\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ A_{3}= & {} 1+f_{r}\left\{ \begin{array}{c} C_{y}^{2}+2K_{3}^{2}C_{x}^{2}+2K_{4}^{2}C_{z}^{2}+K_{3}\left( C_{x}^{2}-4\rho _{yx}C_{y}C_{x}\right) +K_{4}\left( C_{z}^{2}-4\rho _{yz}C_{y}C_{z}\right) + \\ 4K_{3}K_{4}\rho _{zx}C_{z}C_{x} \end{array} \right\} \\ B_{3}= & {} 1+f_{r}\left\{ \dfrac{K_{3}^{2}}{2}C_{x}^{2}+\dfrac{K_{4}^{2}}{2} C_{z}^{2}+\dfrac{K_{3}}{2}\left( C_{x}^{2}-2\rho _{yx}C_{y}C_{x}\right) \right. \\&\left. + \dfrac{K_{4}}{2}\left( C_{z}^{2}-2\rho _{yz}C_{y}C_{z}\right) +K_{3}K_{4}\rho _{zx}C_{z}C_{x}\right\} \\ \end{aligned}$$
$$\begin{aligned} A_{4}= & {} 1+f_{r}\left\{ C_{y}^{2}+3\delta _{3}^{2}C_{x}^{2}+3\delta _{4}^{2}C_{z}^{2}-4\delta _{3}\rho _{yx}C_{y}C_{x}-4\delta _{4}\rho _{yz}C_{y}C_{z}+4\delta _{3}\delta _{4}\rho _{zx}C_{z}C_{x}\right\} \\ B_{4}= & {} 1+f_{r}\left\{ \delta _{3}^{2}C_{x}^{2}+\delta _{4}^{2}C_{z}^{2}-\delta _{3}\rho _{yx}C_{y}C_{x}-\delta _{4}\rho _{yz}C_{y}C_{z}+\delta _{3}\delta _{4}\rho _{zx}C_{z}C_{x}\right\} \\ A_{3mult}= & {} 1+f_{r}\left\{ C_{y}^{2}+2\sum \limits _{j=1}^{p}K_{j}^{2}C_{j}^{2}+\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-4\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +4\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ B_{3mult}= & {} 1+f_{r}\left\{ \dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}^{2}C_{x_{j}}^{2}+\dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-2\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ A_{4mult}= & {} 1+f_{r}\left\{ C_{y}^{2}+3\sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-4\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}+4\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ B_{4mult}= & {} 1+f_{r}\left\{ \sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}+\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \end{aligned}$$
$$\begin{aligned} A_{5}= & {} 1+f_{r}C_{y}^{2}+f_{rn}\left\{ \begin{array}{c} 2K_{5}^{2}C_{x}^{2}+2K_{6}^{2}C_{z}^{2}+K_{5}\left( C_{x}^{2}-4\rho _{yx}C_{y}C_{x}\right) +K_{6}\left( C_{z}^{2}-4\rho _{yz}C_{y}C_{z}\right) + \\ 4K_{5}K_{6}\rho _{zx}C_{z}C_{x} \end{array} \right\} \end{aligned}$$
$$\begin{aligned} B_{5}= & {} 1+f_{rn}\left\{ \begin{array}{c} \dfrac{K_{5}^{2}}{2}C_{x}^{2}+\dfrac{K_{6}^{2}}{2}C_{z}^{2}+\dfrac{K_{5}}{2} \left( C_{x}^{2}-2\rho _{yx}C_{y}C_{x}\right) +\dfrac{K_{6}}{2}\left( C_{z}^{2}-2\rho _{yz}C_{y}C_{z}\right) + \\ K_{5}K_{6}\rho _{zx}C_{z}C_{x} \end{array} \right\} \end{aligned}$$
$$\begin{aligned} A_{6}= & {} 1+f_{r}C_{y}^{2}+f_{rn}\left\{ 3\delta _{5}^{2}C_{x}^{2}+3\delta _{6}^{2}C_{z}^{2}-4\delta _{5}\rho _{yx}C_{y}C_{x}-4\delta _{6}\rho _{yz}C_{y}C_{z}+4\delta _{5}\delta _{6}\rho _{zx}C_{z}C_{x}\right\} \\ B_{6}= & {} 1+f_{rn}\left\{ \delta _{5}^{2}C_{x}^{2}+\delta _{6}^{2}C_{z}^{2}-\delta _{5}\rho _{yx}C_{y}C_{x}-\delta _{6}\rho _{yz}C_{y}C_{z}+\delta _{5}\delta _{6}\rho _{zx}C_{z}C_{x}\right\} \\ A_{5mult}= & {} 1+f_{r}C_{y}^{2}+f_{rn}\left\{ 2\sum \limits _{j=1}^{p}K_{j}^{2}C_{j}^{2}+\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-4\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +4\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ B_{5mult}= & {} 1+f_{rn}\left\{ \dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}^{2}C_{x_{j}}^{2}+\dfrac{1}{2}\sum \limits _{j=1}^{p}K_{j}\left( C_{x_{j}}^{2}-2\rho _{yx_{j}}C_{y}C_{x_{j}}\right) \right. \\&\left. +\sum \sum \limits _{i>j}^{p}K_{i}K_{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \\ A_{6mult}= & {} 1+f_{r}C_{y}^{2}+f_{rn}\left\{ 3\sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-4\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}\right. \\&\left. +4\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \end{aligned}$$
$$\begin{aligned} B_{6mult}= & {} 1+f_{rn}\left\{ \sum \limits _{j=1}^{p}\delta _{j}^{2}C_{x_{j}}^{2}-\sum \limits _{j=1}^{p}\delta _{j}\rho _{yx_{j}}C_{y_{j}}C_{x_{j}}+\sum \sum \limits _{i>j}^{p}\delta _{i}\delta _{j}\rho _{x_{i}x_{j}}C_{x_{i}}C_{x_{j}}\right\} \end{aligned}$$

The optimum value of the constants involve in the estimators are

$$\begin{aligned} K_{1}= & {} K_{3}=K_{5}=\frac{C_{y}\left( \rho _{yx}-\rho _{yz}\rho _{zx}\right) }{C_{x}\left( 1-\rho _{zx}^{2}\right) }, K_{2}=K_{4}=K_{6}=\frac{C_{y}\left( \rho _{yz}-\rho _{yx}\rho _{zx}\right) }{ C_{z}\left( 1-\rho _{zx}^{2}\right) }\\ \delta _{1}= & {} \delta _{3}=\delta _{5}=\frac{C_{y}\left( \rho _{yx}-\rho _{yz}\rho _{zx}\right) }{C_{x}\left( 1-\rho _{zx}^{2}\right) }, \delta _{2}=\delta _{4}=\delta _{6}=\frac{C_{y}\left( \rho _{yz}-\rho _{yx}\rho _{zx}\right) }{ C_{z}\left( 1-\rho _{zx}^{2}\right) }. \end{aligned}$$

Appendix 2

library(MASS)

library(e1071)

set.seed(989898);

\(N=2000; n = 256; r = 204;\)

\(sd.vec = c(10,10,10);royx =0.8;royz=0.7;rozx=0.49\)

\(cor.mat = matrix(c(1,royx,royz,royx,1,rozx,royz,rozx,1), ncol = 3)\)

\(cov.mat = diag(sd.vec) \%*\% cor.mat \%*\% diag(sd.vec)\)

\(dat1<-mvrnorm(n=2000, mu = c(5,5,5), Sigma=cov.mat, empirical =FALSE);\)

\(dat<-as.data.frame(dat1)\)

\(Y=dat\)V1; X=dat\(V2; Z=dat\)V3

\(\hbox {Mx}=\hbox {mean(X)};\hbox {My}=\hbox {mean(Y)};\hbox {Mz}=\hbox {mean(Z)}\)

\(\hbox {Vx}=\hbox {var(X)};\hbox {Vy}=\hbox {var(Y)};\hbox {Vz}=\hbox {var(Z)}; \hbox {Sx}<-\hbox {sd(X)};\hbox {Sy}<-\hbox {sd(Y)};\hbox {Sz}<-\hbox {sd(Z)};\)

\(\hbox {Cx}<-\hbox {Sx/Mx};\hbox {Cy}<-\hbox {Sy/My;Cz}<-\hbox {Sz/Mz;Sxy}=\hbox {cov(Y,X)}; \hbox {Syz}=\hbox {cov(Y,Z)};\hbox {Szx}=\hbox {cov(Z,X)}\)

\(\hbox {ryx}=\hbox {cor(Y,X)};\hbox {ryz}=\hbox {cor(Y,Z)};\hbox {rzx}=\hbox {cor(Z,X)}\)

$$\begin{aligned}&Ryzx<-(ryx^2)+(ryz^2)-(2*ryx*ryz*rzx)/(1-(rzx^2))\\&\hbox {R}<-(\hbox {My/Mx);fn}<-(1/\hbox {n}-1/\hbox {N});\hbox {fr}<-(1/\hbox {r}-1/\hbox {N});\hbox {frn}<-(1/\hbox {r}-1/\hbox {n});\\&b1=b3=b5=((Sy*(ryx-(ryz*rzx)))/(Sx*(1-(rzx^2))));\\&b2=b4=b6=((Sy*(ryz-(ryx*rzx)))/(Sz*(1-(rzx^2))));\\&m1<-(fr*Sy^2);m2<-(Sy^2*(frn+(fn*(1-Ryzx^2))));\\&m3<-(Sy^2*fr*(1-Ryzx^2));m4<-(Sy^2*(fn+(frn*(1-Ryzx^2))));\\&M1<-(My^2*m2)/(My^2+m2);M2<-(My^2*m3)/(My^2+m3);M3<\\&-(My^2*m4)/(My^2+m4);\\&g1<-((My^2)/((My^2)+(Sy^2*(frn+(fn*(1-(Ryzx^2)))))));\\&d1<-((g1*Sy*(ryx-(ryz*rzx)))/(Sx*(1-(rzx^2))));\\&d2<-((g1*Sy*(ryz-(ryx*rzx)))/(Sz*(1-(rzx^2))));\\&g2<-((My^2)/((My^2)+(Sy^2*(fr*(1-(Ryzx^2))))));\\&d3<-((g2*Sy*(ryx-(ryz*rzx)))/(Sx*(1-(rzx^2))));\\&d4<-((g2*Sy*(ryz-(ryx*rzx)))/(Sz*(1-(rzx^2))));\\&g3<-((My^2)/((My^2)+(Sy^2*(fn+(frn*(1-(Ryzx^2)))))));\\&d5<-((g3*Sy*(ryx-(ryz*rzx)))/(Sx*(1-(rzx^2)))); \\&d6<-((g3*Sy*(ryz-(ryx*rzx)))/(Sz*(1-(rzx^2))));\\&K1=K3=K5=((Cy*(ryx-(ryz*rzx)))/(Cx*(1-(rzx^2))));\\&K2=K4=K6=((Cy*(ryz-(ryx*rzx)))/(Cz*(1-(rzx^2))));\\&a1=a3=a5=((Cy*(ryx-(ryz*rzx)))/(Cx*(1-(rzx^2))));\\&a2=a4=a6=((Cy*(ryz-(ryx*rzx)))/(Cz*(1-(rzx^2))));\\&A1<-(1+(fr*Cy^2)+fn*((2*(K1^2)*(Cx^2))+(2*(K2^2)*(Cz^2))\\&\quad +K1* ((Cx^2)-(4*ryx*Cy*Cx))\\&\quad + K2*((Cz^2)-(4*ryz*Cy*Cz))+(4*K1*K2*rzx*Cz*Cx)));\\&B1<-(1+(fn*(((K1^2/2)*(Cx^2))+((K2^2/2)*(Cz^2))+(K1/2)*((Cx^2)\\&\quad -(2*ryx*Cy*Cx))\\&\quad + (K2/2)*((Cz^2)-(2*ryz*Cy*Cz))+(K1*K2*rzx*Cz*Cx)))); \end{aligned}$$
$$\begin{aligned}&A2<-(1+fr*((Cy^2)+(2*(K3^2)*(Cx^2))+(2*(K4^2)*(Cz^2))\\&\quad +K3*((Cx^2)-(4*ryx*Cy*Cx))\\&\quad + K4*((Cz^2)-(4*ryz*Cy*Cz))+(4*K3*K4*rzx*Cz*Cx)));\\&B2<-(1+fr*(((K3^2/2)*(Cx^2))+((K4^2/2)*(Cz^2))+((K3/2)*((Cx^2)\\&\quad -(2*ryx*Cy*Cx)))\\&\quad + ((K4/2)*((Cz^2)-(2*ryz*Cy*Cz)))+(K3*K4*rzx*Cz*Cx)));\\&A3<-(1+(fr*Cy^2)+frn*((2*(K5^2)*(Cx^2))+(2*(K6^2)*(Cz^2))\\&+(K5*((Cx^2)-(4*ryx*Cy*Cx)))\\&\quad + (K6*((Cz^2)-(4*ryz*Cy*Cz)))+(4*K5*K6*rzx*Cz*Cx)));\\&B3<-(1+(frn*(((K5^2/2)*(Cx^2))+((K6^2/2)*(Cz^2))\\&\quad +((K5/2)*((Cx^2)\\&\quad - (2*ryx*Cy*Cx)))+((K6/2)*((Cz^2)-(2*ryz*Cy*Cz)))\\&\quad +(K5*K6*rzx*Cz*Cx))))\\&A4<-(1+(fr*Cy^2)+(fn*((3*(a1^2)*(Cx^2))+(3*(a2^2)*(Cz^2))\\&\quad -(4*a1*ryx*Cy*Cx)\\&\quad - (4*a2*ryz*Cy*Cz)+(4*a1*a2*rzx*Cz*Cx))));\\&B4<-(1+(fn*(((a1^2)*(Cx^2))+((a2^2)*(Cz^2))-(a1*ryx*Cy*Cx)\\&\quad - (a2*ryz*Cy*Cz)+(a1*a2*rzx*Cz*Cx))));\\&A5<-(1+(fr*(Cy^2+(3*(a3^2)*(Cx^2))+(3*(a4^2)*(Cz^2))\\&\quad -(4*a3*ryx*Cy*Cx)\\&\quad - (4*a4*ryz*Cy*Cz)+(4*a3*a4*rzx*Cz*Cx))));\\&B5<-(1+(fr*(((a3^2)*(Cx^2))+((a4^2)*(Cz^2))-(a3*ryx*Cy*Cx)\\&\quad - (a4*ryz*Cy*Cz)+(a3*a4*rzx*Cz*Cx))));\\&A6<-(1+(fr*Cy^2)+(frn*((3*(a5^2)*(Cx^2))+(3*(a6^2)*(Cz^2))\\&\quad -(4*a5*ryx*Cy*Cx)\\&\quad - (4*a6*ryz*Cy*Cz)+(4*a5*a6*rzx*Cz*Cx))));\\ \end{aligned}$$
$$\begin{aligned}&B6<-(1+(frn*(((a5^2)*(Cx^2))+((a6^2)*(Cz^2))-(a5*ryx*Cy*Cx)\\&\quad - (a6*ryz*Cy*Cz)+(a5*a6*rzx*Cz*Cx))));\\&g4<-(B1/A1);g5<-(B2/A2);g6<-(B3/A3);\\&g7<-(B4/A4);g8<-(B5/A5);g9<-(B6/A6);\\&M4<-My^2*(1-(B1^2/A1));M5<-My^2*(1-(B2^2/A2));\\&M6<-My^2*(1-(B3^2/A3));\\&M7<-My^2*(1-(B4^2/A4));M8<-My^2*(1-(B5^2/A5));\\&M9<-My^2*(1-(B6^2/A6));\\&p1<-(m1/m2)*100;p1,P1<-(m1/M1)*100;P1,P4<-(m1/M4)*100;P4\\&P7<-(m1/M7)*100;P7,p2<-(m1/m3)*100;p2,P2<-(m1/M2)*100;P2\\&P5<-(m1/M5)*100;P5,P8<-(m1/M8)*100;P8,p3<-(m1/m4)*100;p3\\&P3<-(m1/M3)*100;P3,P6<-(m1/M6)*100;P6,P9<-(m1/M9)*100;P9 \end{aligned}$$

R Code for Simulation Study

$$\begin{aligned} M10= & {} NA;M11=NA;M12=NA;M13=NA;M14=NA;M15=NA;\\ M16= & {} NA;M21=NA;M22=NA; M23=NA;\\ M24= & {} NA;M25=NA; M26=NA;M31=NA;M32=NA;M33=NA;\\ M34= & {} NA;M35=NA;M36=NA; \end{aligned}$$
$$\begin{aligned}&for(i in 1:50000)\{\\&smp1=c(sample(1:1000, 256,replace=F))\\&\hbox {mar1}=\hbox {dat}[\hbox {smp1},];\\&\hbox {smp2}=\hbox {c}(\hbox {sample}(1:256,204,\hbox {replace}=\hbox {F}));\\&\hbox {mar}2=\hbox {mar1}[\hbox {smp2},];\\&\hbox {Yn}=\hbox {mar1}[,1];\hbox {Xn}=\hbox {mar1}[,2];\hbox {Zn}=\hbox {mar1}[,3];\\&\hbox {YY}=\hbox {mar2}[,1];\hbox {XX}=\hbox {mar2}[,2];\hbox {ZZ}=\hbox {mar}2[,3];\\&\hbox {ybn}=\hbox {mean(Yn)};\hbox {xbn}=\hbox {mean(Xn)};\hbox {zbn}=\hbox {mean(Zn)};\\&\hbox {ybr}=\hbox {mean(YY)};\hbox {xbr}=\hbox {mean(XX)};\hbox {zbr}=\hbox {mean(ZZ)};\\&yr<-mean(YY);\\&M10[i]<-((yr-mean(Y))^2);\\&t1<-mean(YY)+b1*(mean(X)-mean(Xn))+b2*(mean(Z)-mean(Zn));\\&M11[i]<-((t1-mean(Y))^2);\\&t2<-mean(YY)+b3*(mean(X)-mean(XX))+b4*(mean(Z)-mean(ZZ)); \end{aligned}$$
$$\begin{aligned}&M12[i]<-((t2-mean(Y))^2);\\&t3<-mean(YY)+b5*(mean(Xn)-mean(XX))+b6*(mean(Zn)-mean(ZZ));\\&M13[i]<-((t3-mean(Y))^2); \\&\hbox {T1}<-(\hbox {g1}*\hbox {mean(YY)})+\hbox {d1}*(\hbox {mean(X)}\\&\quad -\hbox {mean(Xn)})+\hbox {d2}* (\hbox {mean(Z)}-\hbox {mean(Zn)});\\&M14[i]<-((T1-mean(Y))^2); \\&T2<-(g2*mean(YY))+d3*(mean(X)-mean(XX))+d4*(mean(Z)-mean(ZZ));\\&M15[i]<-((T2-mean(Y))^2);\\&T3<-(g3*mean(YY))+d5*(mean(Xn)-mean(XX))\\&\quad +d6*(mean(Zn)-mean(ZZ));\\&M16[i]<-((T3-mean(Y))^2);\\&t4<-mean(YY)*((mean(X)/mean(Xn))^K1)*((mean(Z)/mean(Zn))^K2); \\&M21[i]<-((t4-mean(Y))^2);\\&t5<-mean(YY)*((mean(X)/mean(XX))^K3)*((mean(Z)/mean(ZZ))^K4); \end{aligned}$$
$$\begin{aligned}&M22[i]<-((t5-mean(Y))^2);\\&t6<-mean(YY)*((mean(Xn)/mean(XX))^K5)*((mean(Zn)/mean(ZZ))^K6);\\&M23[i]<-((t6-mean(Y))^2);\\&T4<-(g4*mean(YY))*((mean(X)/mean(Xn))^K1)*((mean(Z)/mean(Zn))^K2);\\&M24[i]<-((T4-mean(Y))^2);\\&T5<-(g5*mean(YY))*((mean(X)/mean(XX))^K3)*((mean(Z)/mean(ZZ))^K4);\\&M25[i]<-((T5-mean(Y))^2);\\&T6<-(g6*mean(YY))*((mean(Xn)/mean(XX))^K5)*((mean(Zn)/mean(ZZ))^K6); \\&M26[i]<-((T6-mean(Y))^2);\\&t7<-mean(YY)*(mean(X)/((a1*mean(Xn))+((1-a1)*mean(X))))\\&\quad *(mean(Z)/((a2*mean(Zn))+((1-a2)*mean(Z))));\\&M31[i]<-((t7-mean(Y))^2);\\&t8<-mean(YY)*(mean(X)/((a3*mean(XX))+((1-a3)*mean(X))))\\&\quad *(mean(Z)/((a4*mean(ZZ))+((1-a4)*mean(Z))));\\ \end{aligned}$$
$$\begin{aligned}&M32[i]<-((t8-mean(Y))^2);\\&t9<-mean(YY)*(mean(X)/((a5*mean(Xn))+((1-a5)*mean(XX))))\\&\quad *(mean(Z)/((a6*mean(Zn))+((1-a6)*mean(ZZ))));\\&M33[i]<-((t9-mean(Y))^2);\\&T7<-(g7*mean(YY))*(mean(X)/((a1*mean(Xn))+((1-a1)*mean(X))))\\&\quad *(mean(Z)/((a2*mean(Zn))+((1-a2)*mean(Z))));\\&M34[i]<-((T7-mean(Y))^2)\\&T8<-(g8*mean(YY))*(mean(X)/((a3*mean(XX))+((1-a3)*mean(X))))\\&\quad *(mean(Z)/((a4*mean(ZZ))+((1-a4)*mean(Z)))); \\&M35[i]<-((T8-mean(Y))^2); \\&T9<-(g9*mean(YY))*(mean(X)/((a5*mean(Xn))+((1-a5)*mean(XX))))\\&\quad *(mean(Z)/((a6*mean(Zn))+((1-a6)*mean(ZZ))));\\&M36[i]<-((T9-mean(Y))^2);\}\ \end{aligned}$$
$$\begin{aligned} m10= & {} sum(M10)/50000;m11=sum(M11)/50000;m12=sum(M12)/50000;\\ m13= & {} sum(M13)/50000;m14=sum(M14)/50000;m15=sum(M15)/50000;\\ m16= & {} sum(M16)/50000;m21=sum(M21)/50000;m22=sum(M22)/50000;\\ m23= & {} sum(M23)/50000;m24=sum(M24)/50000;m25=sum(M25)/50000;\\ m26= & {} sum(M26)/50000;m31=sum(M31)/50000;m32=sum(M32)/50000;\\ m33= & {} sum(M33)/50000;m34=sum(M34)/50000;m35=sum(M35)/50000;\\ m36= & {} sum(M36)/50000; \end{aligned}$$
$$\begin{aligned}&m10;m11;m12;m13;m14;m15;m16;m21;m22;m23;m24;m25;\\&m26;m31;m32;m33;m34;m35;m36; \end{aligned}$$

PRE10=100

#Strategy1

$$\begin{aligned} PRE11= & {} m10/m11*100;PRE11\\ PRE14= & {} m10/m14*100;PRE14\\ PRE21= & {} m10/m21*100;PRE21\\ PRE24= & {} m10/m24*100;PRE24\\ PRE31= & {} m10/m31*100;PRE31\\ PRE34= & {} m10/m34*100;PRE34\\ \end{aligned}$$

#Strategy2

$$\begin{aligned} PRE12= & {} m10/m12*100;PRE12\\ PRE15= & {} m10/m15*100;PRE15\\ PRE22= & {} m10/m22*100;PRE22\\ PRE25= & {} m10/m25*100;PRE25\\ PRE32= & {} m10/m32*100;PRE32\\ PRE35= & {} m10/m35*100;PRE35\\ \end{aligned}$$

#Strategy3

$$\begin{aligned} PRE13= & {} m10/m13*100;PRE13\\ PRE16= & {} m10/m16*100;PRE16\\ PRE23= & {} m10/m23*100;PRE23\\ PRE26= & {} m10/m26*100;PRE26\\ PRE33= & {} m10/m33*100;PRE33\\ PRE36= & {} m10/m36*100;PRE36 \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhushan, S., Pandey, A.P. Optimal imputation of the missing data using multi auxiliary information. Comput Stat 36, 449–477 (2021). https://doi.org/10.1007/s00180-020-01016-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01016-9

Keywords

Navigation