A predictive estimator of finite population proportion despite missing data
Introduction
The use of auxiliary population information, provided by one or several auxiliary variables, at the estimation stage is a commonly-used technique that offers many advantages [6], [19], [20], [4], [17], [23], etc. However, in many practical situations, instead of auxiliary variables there exist certain auxiliary attributes which are correlated with the study variable. Abd-Elfattah et al. [1], Grover and Kaur [10], Koyuncu [13] and Singh and Solanki [18] proposed a set of estimators for the mean using information on a single auxiliary attribute, in simple random sampling, and this was later extended by Malik and Singh [14] to the case of two attributes. All these papers were formulated assuming that there is no lack of response.
Information on auxiliary attributes can also be used to deal with missing values, a problem that commonly arises in survey research and which often poses severe problems. A variety of methods have been developed to compensate for missing data in a general purpose way so that the survey data file can be analysed irrespective of the missing data (see e.g., [19, chapter 12]).
When sample observations are missing, the simplest solution is to eliminate the incomplete observations, but this can produce biases in the estimations and increase sampling variance. Another solution is to employ imputation techniques to replace the missing observations (see e.g., [12], [5]). However, this practice may invalidate the inferences and can often have serious consequences. Considering that the missing observations may contain valuable information, a third option is to attempt to improve the precision of the estimators by including all cases available for their calculation. Some authors have defined indirect estimators for means and variances when the sample is drawn according to the procedure of simple random sampling without replacement when some observations are missing (see e.g., [27], [24], [25], [26], [21], [22], [16]).
However, the estimation of a population proportion in the presence of missing data is a problem that has received little research attention. Álvarez et al. [3] recently defined a general class of estimators of a population proportion on the basis of a random sample drawn according to any sampling design, and assuming an auxiliary attribute whose population proportion is known from a census or estimated without sampling errors.
The estimation of a single proportion is a commonly used statistic in many practical and research situations (biopharmaceutical experiments, clinical research, marketing research, opinion surveys, polls, etc.). These surveys often contain auxiliary information on several variables (including numeric and binary attributes). In this study, we seek to build a new estimator that makes use of the information in the sample for the study and auxiliary variables (quantitative or attribute), to estimate the population proportion, on the basis of a logistic regression superpopulation model.
In Section 2, we introduce the problem of the estimation of a proportion when there are missing values. We define a new estimator of the population proportion in the case of a general sampling design, assuming that two auxiliary variables (quantitative or attribute) are available. Assuming different scenarios, the proposed point estimators are evaluated empirically in Section 3, and we report that the conclusions obtained are consistent with the theoretical properties derived in the previous sections.
Section snippets
Proportion estimators in the presence of missing values
Let be a population of N identifiable elements. We consider the problem of estimating the population proportion , where is an attribute indicator for unit i, i.e., if unit i has the attribute of interest A, and otherwise. is the parameter of interest, which needs to be estimated. For this purpose, a random sample s, of size n, is selected from U according to a given sampling design. The first- and the second-order inclusion probabilities associated with
Properties of the proposed estimator
A model-based estimator for the population mean has been defined in Section 2. We now study several properties of this estimator, which may be important in practice.
- •
is linear in the Y’s.the weights are independent of and B depends on the sample s only through the variables x and z.
- •
is data intensive in the sense that we must know the values of x for all the units of the population. This assumption is usual in social surveys with information
Simulation study
The theoretical comparison of the proposed alternative estimators is not a simple issue because they rely on different principles: on the one hand, prediction (or model-based) theory, and on the other, probability sampling (or design-based) theory. Little [11] examined some aspects of the debate between design-based and model-based inference for sample surveys. Model-based estimators often have a smaller variance than design-based competitors, especially for small samples where the latter
Discussion
The estimation of a proportion is a commonly used statistic for summarising data. The customary proportion estimator does not involve auxiliary information at the estimation stage, and so the aim of this paper is to add this auxiliary information in the presence of missing data, and to do so in an efficient way.
The proposed estimator shows very good behaviour in simulation studies versus estimators and , achieving increased efficiency. We note that the estimator was compared in [3]
Acknowledgements
This work is partially supported by Ministerio de Educación y Ciencia (contract Nos. MTM2009-10055 and MTM2012-35650).
References (27)
- et al.
Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute
Appl. Math. Comput.
(2010) - et al.
Estimating population proportions in the presence of missing data
J. Comput. Appl. Math.
(2013) - et al.
Incorporating the auxiliary information available in variance estimation
Appl. Math. Comput.
(2005) - et al.
A new method for estimating variance from data imputed with ratio method of imputation
Stat. Probab. Lett.
(2006) - et al.
An improved exponential estimator of finite population mean in simple random sampling using an auxiliary attribute
Appl. Math. Comput.
(2011) Efficient estimators of population mean using auxiliary attributes
Appl. Math. Comput.
(2012)- et al.
An improved estimator using two auxiliary attributes
Appl. Math. Comput.
(2013) - et al.
Improved estimation of population mean in simple random sampling using information on auxiliary attribute
Appl. Math. Comput.
(2012) - et al.
On the estimation of the general parameter
Comput. Stat. Data Anal.
(2008) Categorical Data Analysis
(2002)
Calibration estimators in survey sampling
J. Am. Stat. Assoc.
Estimation of a proportion with survey data
J. Stat. Edu.
Ejercicios y Prácticas de Muestreo en Poblaciones Finitas
Cited by (1)
Estimation of Mean in Double Sampling Using Exponential Technique on Multi-auxiliary Variates
2017, Communications in Mathematics and Statistics