Bayesian estimation based on trimmed samples from Pareto populations

https://doi.org/10.1016/j.csda.2005.11.010Get rights and content

Abstract

Trimmed samples are widely employed in several areas of statistical practice, especially when some sample values at either or both extremes might have been contaminated. The problem of estimating the inequality and precision parameters of a Pareto distribution based on a trimmed sample and prior information is considered. From an inferential viewpoint, the problem of finding the highest posterior density (HPD) estimates of the Pareto parameters is discussed. The existence and uniqueness of the HPD estimates are established under mild conditions; explicit and accurate lower and upper bounds are also provided. Adopting a decision-theoretic perspective, several Bayesian estimators for standard loss functions are presented. In addition, two-sided and HPD credibility intervals for each Pareto parameter and joint HPD credibility regions for both parameters are derived, which have the corresponding frequentist confidence level in the noninformative case. Finally, an illustrative example concerning annual wage data is included.

Introduction

Trimmed samples are commonly used for several applications including some robust estimation procedures (e.g., Prescott, 1978, Huber, 1981, Healy, 1982, Welsh, 1987, Wilcox, 1995). For instance, trimmed means make up a well-known class of robust location estimates that range from the sample mean to the sample median. The trimming removes a fixed proportion of the outlying sample values, i.e., certain proportions q1 and q2 of the smallest and largest observations are eliminated. On the other hand, in various situations some data may not be observed due to time limitations and other restrictions on data collection. In particular, a known number of observations in an ordered sample are missing at either end (single censoring) or at both ends (double censoring) in failure censored experiments. Specifically, double censoring has been widely treated in the literature (e.g., Ng, 1976, Healy, 1978, Prescott, 1979, Schneider, 1984, LaRiccia, 1986, Schneider and Weissfeld, 1986, Escobar and Meeker, 1994, Upadhyay and Shastri, 1997; Fernández, 2000a, Fernández, 2000b; Fernández, 2004, Fernández et al., 2002, Raqab and Madi, 2002, Ali Mousa, 2003).

The Pareto distribution provides a population model which has a wide variety of applications in many fields. Concretely, applications are seen in insurance risk studies, property values, income, stock price fluctuations, migration, size of cities and firms, word frequencies, availability of natural resources, service time in queuing systems, error clustering in communications circuits and business failures. The probability density function (pdf) and cumulative distribution function (cdf) of a random variable X having a Pareto law P(α,τ) with inequality (or shape) parameter α>0 and precision parameter τ>0 are given byf(x;α,τ)=τα(τx)-α-1andF(x;α,τ)=1-(τx)-α,x1/τ.From the Bayesian perspective, (α,τ) is not merely an unknown fixed quantity but rather a random variable that is characterized by some prior pdf. This point of view has been considered by many authors (e.g., Lwin, 1972, Arnold and Press, 1983, Arnold and Press, 1986, Arnold and Press, 1989, Geisser, 1984, Geisser, 1985, Nigm and Hamdy, 1987, Tiwari and Zalkikar, 1990, Tiwari et al., 1996, Upadhyay and Shastri, 1997, Ali Mousa, 2003). The use of a Bayesian approach allows both sample and prior information to be incorporated into the statistical analysis, which will improve the quality of the inferences and permit a reduction in sample size. The decision-theoretic viewpoint takes into account additional information concerning the possible consequences of our decisions (quantified by a loss function).

There may be situations where it is convenient to choose a single value as an estimate of (α,τ). This paper discusses the problem of finding the Bayesian estimates of the Pareto parameters based on a trimmed sample, i.e., when some of the smallest and largest sample values are not available or have been discarded, applying both inferential and decision-theoretic viewpoints. The structure of the article is as follows. The next section presents the likelihood function and prior pdf for (α,τ). The joint posterior density of (α,τ) and its marginals are derived in Section 3. The existence and uniqueness of the highest posterior density (HPD) estimates of α and τ are shown in Section 4 under mild conditions. Precise lower and upper bounds are also provided in closed forms. Section 5 presents several Bayesian estimators from a decisional perspective. Section 6 is devoted to Bayesian interval and region estimation. An illustrative example is given in Section 7. The paper concludes with several comments.

Section snippets

Sample and prior information

Consider a random sample of size n from a P(α,τ) distribution (α and τ being unknown), and let xr:n,,xs:n be the ordered observations remaining when the (r-1)=nq1 smallest and (n-s)=nq2 largest observations have been discarded or censored, where 1rsn. Then, given the trimmed sample x=(xr:n,,xs:n), the likelihood function of (α,τ) can be written asL(α,τ|x)=n!Fxr:n;α,τr-11-Fxs:n;α,τn-si=rsfxi:n;α,τ(r-1)!(n-s)!.Hence, in accord with (1) and (2), the likelihood becomesL(α,τ|x)=n!αm1-xr:nτ-αr-1

Posterior knowledge

The posterior distribution of (α,τ) presents a complete description of what is known about the Pareto parameters from the sample information and prior knowledge, expressed, respectively, by (3) and (6).

The conditions bxr:n, m+c>0 and W(x)=logbaU(x)/xr:nn-r+1+a+d>0 will be assumed to hold hereafter, unless otherwise specified. Then, utilizing Bayes’ theorem, the posterior pdf of (α,τ), given x, is deduced to beπ(α,τ|x)={αW(x)}m+c1-τxr:n-αr-1τ-(n-r+1+a)α-1Γ(m+c)B(r,n-r+1+a)baexp(d)U(x)αfor α>0

Bayesian estimation: an inferential approach

Unless the researcher incorporates further information on the consequences of incorrect choices of (α,τ) (i.e., a loss function), there would seem to be no other suitable criterion for choosing a single value to estimate (α,τ) than to use the most likely value of posterior density (8), i.e., the posterior mode. The HPD estimate of (α,τ), α^,τ^α^(x),τ^(x), is therefore an appropriate choice from a Bayesian inferential viewpoint, i.e., when there is no compelling reason to accept some specific

Bayesian estimation: a decision-theoretic perspective

From a decision-theoretic point of view, in order to select a single value as representing the “best” estimate of an unknown parameter, a convenient loss function must be specified. Assuming the commonly used squared-error loss function, Lα,α˜=α-α˜2, the Bayes estimate of α (i.e., the value α˜ that minimizes the posterior expected loss) is the mean of the posterior density of α, given x. As α|xG(m+c,W(x)), this estimate is given byα˜α˜(x)=E[α|x]=(m+c)/W(x).The Bayes estimate of τ under

Credibility intervals and regions

In practice, the researcher often has to find intervals that enclose unknown parameters with high credibility. For given ε(0,1), it is clear that (αε/2*,α1-ε/2*) is the two-sided (equitailed) 100(1-ε)% credibility interval for α, whereas the values αε* and α1-ε* are called the (one-sided) lower and upper 100(1-α)% credibility limits (or bounds) for α, respectively. Nevertheless, in choosing a credibility set for an unknown parameter, it is usually desirable to minimize its size. In our case,

An illustrative example

Dyer (1981) reported annual wage data (in multiples of 100 US dollars) of a random sample of 30 production line workers in a large industrial firm, for which the Pareto distribution appeared to be adequate. For illustrative purposes, the six smallest and largest sample values will be discarded, i.e., the 40% symmetrically trimmed sample (107,107,108,108,111,112,112,112,115,115,116,119,119,119,123,125,128,132)will represent the available data x. Hence n=30, q1=q2=0.2, r=7, s=24, m=18, xr:n=107

Concluding remarks

Classical statistics usually considers current experimental data as its only source of relevant information. Bayesian statistics also utilizes the subjective and objective prior information accumulated from past or external experience, whereas Bayesian decision theory augments the inferential knowledge of the statistical study by also incorporating assessments of (potential) future consequences of alternative decisions, expressed through a loss function.

In many statistical studies, some extreme

Acknowledgement

I would like to thank the editor and three anonymous reviewers for valuable comments.

References (34)

  • L.A. Escobar et al.

    Algorithm AS 292: Fisher information matrix for the extreme value, normal and logistic distributions and censored data

    Appl. Statist.

    (1994)
  • A.J. Fernández

    On maximum likelihood prediction based on type II doubly censored exponential data

    Metrika

    (2000)
  • A.J. Fernández

    One- and two-sample prediction based on doubly censored exponential data and prior information

    Test

    (2004)
  • A.J. Fernández et al.

    Computing maximum likelihood estimates from type II doubly censored exponential data

    Statist. Methods Appl.

    (2002)
  • S. Geisser

    Predicting Pareto and exponential observables

    Canad. J. Statist.

    (1984)
  • M.J.R. Healy

    A mean difference estimator of standard deviation in symmetrically censored samples

    Biometrika

    (1978)
  • M.J.R. Healy

    Algorithm AS 180: a linear estimator of standard deviation in symmetrically trimmed normal samples

    Appl. Statist.

    (1982)
  • Cited by (22)

    • Explicit quasi-optimal inspection schemes from nonconformity count data with controlled producer and consumer risks

      2016, Applied Mathematical Modelling
      Citation Excerpt :

      The number of nonconforming units manufactured in a given length of time, the number of blemishes in sheets of glass of fixed size, the number of imperfections per unit of area in inspecting cloth, and the number of flaws in painted sheets of metal could be modeled by Poisson distributions. In practice it is generally advantageous to properly combine available sample data with existing technical information and experience [35–47]. Nonetheless, a fundamental supposition in conventional inspection from Poisson count data is that the process average λ is a constant number.

    • Optimum attributes component test plans for k-out-of-n:F Weibull systems using prior information

      2015, European Journal of Operational Research
      Citation Excerpt :

      Our viewpoint is extremely advantageous when considering the expenses associated with testing procedures. It is well-known that the use of substantial prior information can significantly increase the quality of the inferential methods; see, e.g., Fernández (2006b, 2010c, 2012, 2014b), Aktekin and Caglar (2013), Aktekin (2014), Ausín, Galeano, and Ghosh (2014) and Ensthaler and Giebe (2014). The rest of this paper is organised as follows.

    • Computing tolerance limits for the lifetime of a k-out-of-n: F system based on prior information and censored data

      2014, Applied Mathematical Modelling
      Citation Excerpt :

      General progressive censoring is a useful and more versatile scheme in which some early failure times may also be discarded or censored; e.g., Balakrishnan and Sandhu [32] and Fernández [33]. Clearly, this general scheme includes as special cases the right, left and double censoring situations; e.g., Khan et al. [34], Fernández [35–38] and Wu [39]. Nowadays, the Bayesian paradigm is riding a high tide of popularity in virtually all areas of statistical application; see, e.g., Li et al. [24], Lee et al. [28], Han [40,41], Jaheen and Okasha [42], Fernández [43,44] and Lin et al. [45], and references therein.

    • Computing optimal confidence sets for Pareto models under progressive censoring

      2014, Journal of Computational and Applied Mathematics
      Citation Excerpt :

      Nowadays, the Pareto model is very common in many practical fields, including reliability and life-testing studies, property values, stock price fluctuations, service times in queuing systems, insurance risk studies, business failures, geological site sizes, availability of natural resources and species, errors clustering in communications circuits, city population sizes, areas burnt in forest fires, migrations, word frequencies and sizes of firms. Papers by Nigm and Hamdy [1], Hong et al. [7], Arnold and Press [19], Chen [20], Fernández [21,22], Wu et al. [23], Wu [24], Xia and Sun [25], Nadarajah [26] and Wang and Shi [27] make up a small list of references in the literature.

    • Smallest Pareto confidence regions and applications

      2013, Computational Statistics and Data Analysis
      Citation Excerpt :

      Inferences based on right censored Pareto samples are widely considered in the scientific literature; see, e.g., Arnold and Press (1989), Tiwari and Zalkikar (1990), Chen (1996), Soliman (2000), Nigm et al. (2003), Hong et al. (2007, 2008) and Wu et al. (2007b). Among other authors, Upadhyay and Shastri (1997), Fernández (2006b) and Wu (2008) analyzed doubly censored Pareto data. Left censoring is also allowed in Wu et al. (2006, 2007a), Fernández (2008a) and Soliman (2008).

    • Minimizing the area of a Pareto confidence region

      2012, European Journal of Operational Research
      Citation Excerpt :

      This method, however, cannot be extended to the double censoring situation. Adopting a Bayesian viewpoint, Fernández (2006a, 2008a) derived joint highest posterior density credibility regions for α and τ using trimmed samples and multiply censored data, respectively. Recently, Wu (2008) has proposed a frequentist confidence region for (α, τ) under double censoring, which improves Chen’s method in terms of a smaller area in the complete and right censoring cases.

    View all citing articles on Scopus
    View full text