Resampling methods for ranked set samples

https://doi.org/10.1016/j.csda.2005.10.010Get rights and content

Abstract

When measuring units are expensive or time consuming, while ranking them can be done easily, it is known that ranked set sampling (RSS) is preferred to simple random sampling (SRS). Available results for RSS are developed under specific parametric assumptions or are asymptotic in nature, with few results available for finite size samples when the underlying distribution of the observed data is unknown. We investigate the use of resampling techniques to draw inferences on population characteristics. To obtain standard error and confidence interval estimates we discuss and compare three methods of resampling a given ranked set sample. Chen et al. (2004. Ranked Set Sampling: Theory and Applications. Springer, New York) suggest a natural method to obtain bootstrap samples from each row of a RSS. We prove that this method is consistent for a location estimator. We propose two other methods that are designed to obtain more stratified resamples from the given sample. Algorithms are provided for these methods. We recommend a method that obtains a bootstrap RSS from the observations. We prove several properties of this method, including consistency for a location parameter. We define two types of L-estimators for RSS and obtain expressions for their exact moments. We discuss an application to obtain confidence intervals for the Winsorized mean of a RSS.

Introduction

Ranked set sampling (RSS) introduced by McIntyre (1952) has many applications in ecological and environmental studies (e.g., Dell and Clutter, 1972, Al-Saleh and Zheng, 2002), reliability theory (Kvam and Samaniego, 1994) and medical studies (Samawi and Al-Sagheer, 2001). RSS is useful when a measurement of interest is difficult or expensive to obtain, but its value can be easily ordered by some means without actual quantification. The RSS procedure is briefly described as follow.

RSS is a two-stage sampling procedure. In the first stage, units are identified and ranked, and in the second stage, measurements are taken from a fraction of the ranked elements. Let mk2 units be randomly identified from the population. The units are then randomly divided into k groups of mk units each. In the first group, units are further randomly divided into m subgroups of size k. Units in each of the m subgroups are ordered by any means other than actual quantification, and actual measurement is taken from the unit having the lowest rank within each subgroup. The resulting measurements from the first group are labeled as x(1)1,,x(1)m. This step is repeated for the second group; but this time, actual measurement is taken from the unit having the second lowest rank within each subgroup. Continuing this procedure until all k groups are processed such that, in the rth group, actual measurement is taken from the unit having the rth lowest rank within each subgroup, yielding measurements x(r)1,,x(r)m. The resulting ranked set sample is denoted by x(r)j;r=1,,k;j=1,,m. Note that the n=mk resulting measurements are independently distributed. However, they are not identically distributed.

Existing results show that one can often improve the accuracy of the analysis using RSS. The available results for RSS focus on inferences on population characteristics either under specific parametric assumptions or asymptotic results, with few results available for finite size samples when the underlying distribution of the observed data is unknown. For applications with small m, however, asymptotic inference may not be valid and for complex statistics θ^n with sampling distribution Hn,F(t), their standard errors may not be known. Bootstrap offers an alternative approach to estimate Hn,F(t) by replacing F with its estimate Fn, the bootstrap estimate of Hn,F(t) is Hn,Fn(t).

Bootstrap is a viable approach to obtain the sampling distribution of test statistics in SRS. It is important to study its use with RSS. Several methods of bootstrapping a RSS suggest themselves. What are the algorithms of these methods? Which algorithm has better coverage probability and produces more accurate confidence intervals? We study the use of bootstrap to draw inference under RSS. Chen et al. (2004) introduced the method of bootstrapping a RSS row-wise. Hui et al. (2005) considered bootstrapping as a way to construct confidence interval for estimation of the population mean via linear regression under RSS. To motivate the use of resampling in RSS and to illustrate these resampling methods, we focus our investigation on the trimmed mean as members of a class of L-estimators in RSS. Simulation results are given on the 20% sample trimmed mean for symmetric distributions and 10% one-sided sample trimmed mean for a skewed distribution.

The article is arranged as follow. Section 2 describes and discusses the properties of three methods of bootstrap: BRSSR (bootstrap RSS by row), BRSS (bootstrap RSS), and MRBRSS (mixed row bootstrap RSS). Section 3 describes linear estimators under RSS and presents the results of a simulation study to compare the three methods. An application using these resampling methods are discussed in Section 4. Summary and concluding remarks are given in Section 5.

Section snippets

Resampling methods for ranked set samples

Given X=X1,,Xn drawn from an unknown distribution F, a SRS X*=X1*,,Xn* can be obtained by randomly sampling n units from X with replacement and equal probability. A ranked set sample, on the other hand, has a complex structure, one that contains information about measurements as well as their partial ordering. A RSS is composed of k independent random samples that are drawn from different distributions. In this respect, we may regard RSS as a stratified sampling design, for which the standard

Linear estimators

Let Z(1)Z(n) be the order statistics of n independently and identically distributed variates from a continuous distribution F, an L-estimator is defined as Vn=r=1ncrZ(r) where c1,,cn are known constants. Depending on how these weighing constants are defined, both the usage of Vn and its asymptotic properties differ. Asymptotic properties of Vn can be found in Serfling (1980). Unfortunately, it is difficult to obtain analytical expressions for the standard error of L-estimators.

Application

Mode et al. (1999) described an RSS application that measures habitat sizes which are known to be linked to salmon production. Many of these habitats are near streams and forests in the Pacific Northwest, and measuring habitat areas are labor intensive and time consuming. Unfortunately the actual dataset is unavailable. However, Mode et al.'s paper note that extreme-valued habitat sizes occur and the LogNormal distribution may provide a good fit to the distribution of habitat sizes. We use a

Concluding remarks

RSS is concerned with samples collected from the field under cost, time or other logistic restrictions. Once collected, however, one must be able to draw inference from it. In this article, we assume a RSS is available and address the issue of obtaining standard errors for RSS estimates based on the bootstrap. We discuss three methods of bootstrapping a RSS. The row-wise bootstrap, BRSSR obtains bootstrap resamples by sampling m observations from each of k rows. True to its stratified nature,

Acknowledgements

We thank the editor and two anonymous referees whose helpful comments and suggestions improved the paper.

References (16)

  • Z. Chen

    On ranked-set sample quantiles and their applications

    J. Statist. Plann. Infer.

    (2000)
  • M.F. Al-Saleh et al.

    Estimation of bivariate characteristics using ranked set sampling

    Austral. New Zealand J. Statist.

    (2002)
  • P.J. Bickel et al.

    Some asymptotic theory for the bootstrap

    Annals Statist.

    (1981)
  • Z. Chen et al.

    Ranked Set Sampling: Theory and Applications

    (2004)
  • J.R. Dell et al.

    Ranked set sampling theory with order statistics background

    Biometrics

    (1972)
  • P. Hall

    The Bootstrap and Edgeworth Expansion

    (1992)
  • T. Hui et al.

    Bootstrap confidence interval estimation of mean via ranked set sampling linear regression

    J. Statist. Comput. Simulation

    (2005)
  • A.D. Hutson et al.

    The exact bootstrap mean and variance of an L-estimator

    J. Roy. Statist. Soc. Ser. B

    (2000)
There are more references available in the full text version of this article.

Cited by (0)

1

Research was supported in part by a grant from the USEPA and was completed while visiting the Center for Statistical Ecology and Environmental Statistics, Department of Statistics, The Pennsylvania state University.

View full text