A Bayesian method for computing sample size and cost requirements for stratified random sampling of pond water

https://doi.org/10.1016/j.envsoft.2005.03.007Get rights and content

Abstract

Estimating average environmental pollution concentrations and its variance is a fairly straight forward task in stratified random sampling. A more challenging concept is the introduction of the cost factor into this environmental model. Traditional statistical techniques have incorporated costs from sampling within a stratum as well as stratum weights to determine the stratum size and overall required sample size. Information in the form of informative prior distributions to determine a more coherent variance in the system yield a more precise Bayesian approach to the sample size and cost calculations. This approach results in a more efficient sampling strategy in terms of cost when considering a pre-specified margin of error for the sampling mean as well as the more complicated situation of correlation among the stratum samples.

Introduction

The traditional statistical approaches to calculating overall and stratum sample sizes in a stratified random sample are fairly straight forward. The procedure is somewhat complicated with the incorporation of cost as well as the possibility of correlation among the stratum samples. Applications of such approaches employing several monitoring strategies are well known as in Thornton et al., 1982, Nelson and Ward, 1981, Reckhow and Chapra, 1983, and Gilbert (1987). Our focus here is to consider a pond water environment in which the strata are basically depth levels. Weighting of the strata as well as the overall variance of the sample mean are the main components in our derived statistics to determine sample size within the stratum. The three situations considered are that of pre-specified margin of error, pre-specified fixed cost and correlation among the stratum samples. Cost efficiency is seen for most situations with the introduction of Bayesian methodology developed by Dayal and Dickey 1976, Bartolucci and Dickey, 1977, Birch and Bartolucci, 1983, Baldi and Long, 2001 and Bartolucci et al. (1998). The thrust of the Bayesian approach is through the derivation of the posterior estimate of the variance derived from coherent inference on a normal variance in the Behrens Fisher context of Dayal and Dickey 1976, Bartolucci et al., 1998. Comparisons of the traditional or classical and Bayesian methodologies are presented using summary data from determining the phosphorous concentration in a pond water sampling environment.

The motivation is to assure that we have a design that conforms to cost effectiveness guidelines recommended by the National Academy of Sciences, 1977, National Academy of Sciences, 2004 and Bartram and Balance (2001). These chosen designs incorporating a cost analysis will either achieve a specified level of effectiveness at minimal cost or a specified effectiveness at a specified cost. The incorporation of the Bayesian analysis as modeling the strata variance allows a further cost savings in the overall approach. The approach can be applied to sampling contaminants from well water or pond water with special attention to agricultural runoff as seen in Atzeni et al. (2001). Also Gilbert et al. (1975) weighed in on the importance of this approach when cost considerations demanded attention when sampling radioactive pollutants from desert sites in Nevada. Our proposed technique can be applied to the sampling plans of Ward et al. (1990) as well as others. Thus historically there are many applications requiring the cost considerations as well as can be refined by cost considerations when sampling from the environment.

In Section 2 below we derive the traditional setup of the sampling providing the basic statistics such as the sampling mean, variance, depth stratum size and weights as well as the overall population size. In Section 3 we incorporate into our formulation the methodology for computing the optimum sample size under the assumptions of the pre-specified margin of sampling error (PMOE). We then introduce cost consideration into the approach at a pre-specified fixed cost per stratum for independent stratum as well as correlated stratum.

In Section 4 we introduce the Bayesian considerations in our methodology, especially as applied to the stratum variance which impacts on the overall final cost. In Section 5 we apply the method to an example when sampling phosphorous concentration in pond water at 5 depth strata and demonstrate the conditions of cost reduction with the Bayesian methodology.

Section snippets

Traditional setup

Let N = total number of population units in the target population. Nh is the number of population units within each of the h stratum, h = 1,…,L. Clearly N=h=1LNh. With reference to the sample, n = total number of sampling units in the target sample. Likewise as above, n=n1+n2++nL=h=1Lnh. We define the weight of the stratum, h, as Wh = Nh/N. The mean, μ, of the population of N units is:μ=1Nh=1LNhμh=h=1LWhμh,where μh is the mean of the h stratum and is estimated bymh=1nhi=1nhxhi,where xhi=ith

Computing the optimum n

An important aspect of stratified random sampling is to determine how many samples are to be collected within a stratum. Gilbert (1987) has proposed a method for doing so that it will minimize the variance s2(mst) in Eq. (5) for a pre-specified fixed cost per stratum or that will minimize the value of s2(mst) under the condition of a pre-specified margin of error (PMOE). The PMOE is the value d such that d = |mst  μ| or the minimal absolute distance we wish to tolerate between the sample mean and

Bayesian considerations

Examining Eqs. (7), (10), (12), (14) we see that they all involve the expression for the stratum variance, s2h. We re-evaluated these expressions adding a prior structure to the variance of Dayal and Dickey (1977), Bartolucci et al. (1998) and then estimating the posterior expression for the variance, normal σ2. We assumed an underlying normal distribution with both mean, μ, and variance, σ2 unknown. In this context we define the likelihood function for n observations:l(μ,σ)σ2(n/2)exp[12σ2(n(μ

Example

We wish to estimate the average phosphorous concentration (μg/100 ml) in pond water. The concentration of 100 ml aliquot from each 1 l sample will be measured. The statistics for a classical representation of the data using the pre-specified margin of error (PMOE) d = 0.2 are given in Table 1. The PMOE = 0.2 is a fairly reasonable choice in environmental sampling (see Gilbert, 1987). There are 5 depth strata to the pond in which N = total number of 100 ml water samples in the pond. Nh is the number of

Discussion

Overall it appears that: compared to the classical sampling analysis for the pre-specified margin of error approach as well as the correlational approach, the Bayesian analysis resulted in a reduction in required samples thus lowering the cost, especially when realistic (empirical) prior hyperparameters are utilized. Also there was no serious impact on the posterior standard error of the estimates of the mean concentration. However, there were no real differences between the classical and

References (19)

  • A.A. Bartolucci et al.

    A Bayesian Behrens Fisher solution to a problem in taxonomy

    Environmental Modeling and Software

    (1998)
  • A.D. Aczel

    Complete Business Statistics

    (1999)
  • M. Atzeni et al.

    A model to predict cattle feedlot runoff for effluent reuse applications

    International Modeling and Simulation Society

    (2001)
  • P. Baldi et al.

    A Bayesian framework for the analysis of microarray data: regularized t-test and statistical inferences of gene changes

    Bioinformatics

    (2001)
  • A.A. Bartolucci et al.

    Comparative Bayesian and traditional inferences for Gamma modeled survival data

    Biometrics

    (1977)
  • J. Bartram et al.

    Water Quality Monitoring: a Practical Guide to the Design and Implementation of Fresh Water Quality Studies and Monitoring Programs

    (2001)
  • R. Birch et al.

    Determination of the hyperparameters of a prior probability model in survival analysis

    Computer Programs in Biomedicine

    (1983)
  • W.G. Cochran

    Sampling Techniques

    (1977)
  • H.H. Dayal et al.

    Bayes factors for Behren's–Fisher problems. Sankhya

    The Indian Journal of Statistics, Series B

    (1976)
There are more references available in the full text version of this article.
View full text