Interfaces with Other DisciplinesInferring the incidence of industry inefficiency from DEA estimates
Highlights
► We use Bayesian methodology to infer the incidence of inefficiency in an industry. ► We place minimal prior information on the nature of the DEA efficiency frontier. ► Our methodology applies to finite populations and sampling without replacement. ► We account for the fact that DEA estimates may misclassify the efficient frontier. ► Three empirical examples support the methodology, especially with low sample sizes.
Introduction
Data envelopment analysis (DEA) is among the most popular tools for measuring productive and cost efficiency. Originally developed by Charnes et al. (1978) to empirically quantify the distance functions posited by Debreu, 1951, Farrell, 1957, DEA uses a linear programming algorithm to evaluate the efficiency of decision-making units (DMUs) on a 0–1 scale. An efficiency score of 0 implies that a particular DMU is completely inefficient, while a DMU with a score of 1 indicates that the DMU is producing at a point on the efficient frontier (whether the frontier is defined as an isoquant, a production possibilities frontier, or a cost frontier), and thus fully efficient.
Using DEA to characterize efficiency is advantageous for several reasons. First, DEA does not have stringent data requirements; researchers need only collect data on the relevant inputs and outputs for each DMU. Second, DEA is “non-parametric” in that it does not require functional form assumptions about the technology underlying the production function or that production is cost minimizing when using DEA to estimate production efficiency. Instead, the efficient frontier is inferred from the observed input–output relationships within the data. Third, unlike other alternatives, such as stochastic frontier analysis (SFA), the application of DEA for generating efficiency estimates when DMUs produce multiple outputs is relatively straightforward.
However, DEA also has at least two major drawbacks, both of which stem from the fact that DEA generates a deterministic frontier based on a sample of data. First, DEA estimates tend to mismeasure efficiency in smaller finite samples, and in particular to over-predict efficiency because the randomly collected samples may not contain enough fully efficient DMUs to accurately characterize the efficient production frontier.3 As such, the production frontier calculated by DEA may lie below the true efficient production possibilities frontier because the most efficient DMUs in the sample, from which the estimated frontier is calculated (and which form, either in part or whole, “efficient peers” for other DMUs in the sample), may not lie on the true production frontier (or, if the production process is input-oriented, an isoquant). In fact, we note as a consequence of the analysis we present ahead, this “small sample” problem may apply more generally, and that even a census of the population may not suffice to solve this DEA mismeasurement problem, which appears to be an underappreciated property of DEA.
Second, and arguably the most important drawback to DEA, is that its statistical foundation and its stochastic properties are complex (Schmidt, 1985). Relating to this point, Simar and Wilson (2007) note that because the efficient frontier is calculated relative to the DMUs in the data, DEA scores are serially correlated in an unknown and complicated manner.4 This unknown statistical foundation is especially problematic for researchers attempting to determine the relationship between efficiency and a set of exogenous policy variables. A common application of DEA is in the context of a two-stage empirical analysis, where the first stage entails calculating DEA estimates and the second stage uses these estimates as outcomes of the dependent variable in a (usually truncated) regression analysis to determine how exogenous policy variables influence efficiency (Simar and Wilson, 2007). Unfortunately, the two stage approach generally requires additional assumptions that may lead to detrimental statistical consequences. One possibility is to assume that DEA scores follow a particular distribution, which then allows the researcher to employ standard maximum likelihood (or Bayesian) regression techniques. Past studies have made a number of different distributional assumptions, including the normal distribution (Ray, 1991, Stanton, 2002, Gajewski et al., 2009) the truncated normal or Tobit distribution (Chilingerian, 1995, Rosenman and Friesner, 2004), the Beta distribution (Sengupta, 1998) and the Beta-Binomial distribution (Sohn and Choi, 2006). A drawback to this approach is the potential for specification bias; if one assumes an incorrect distribution (including accounting for the unknown structure of serial correlation) coefficient estimates generated by this approach will be generally biased and inconsistent.
An alternative espoused by Simar and Wilson (2007) is to employ semiparametric regression techniques to estimate the second stage model.5 They suggest supplementing an MLE-based regression with a specific form of bootstrapping to adjust for any potential and serial correlation. However, this still requires a specific distribution for the likelihood function, resulting in the possibility of specification bias. Moreover, as we already noted, we will argue below that even a census of DMUs does not necessarily fully remove mismeasurement and thus bootstrap methods cannot solve these problems fully. This could be one reason why several recent empirical studies have found that Simar and Wilson’s two stage approach only generates small improvements in second stage estimates (Tsionas and Papadakisa, 2010, Merkert and Hensher, 2011).
Researchers have attempted to avoid the preceding issues by deriving explicit asymptotic distributions for DEA scores, as well as the asymptotic rates of convergence at which randomly sampled DEA scores converge to these distributions. Banker (1993) proved that a distribution of DEA scores exists, and established the consistency of DEA scores for the single output case. Kneip et al. (1998) identified the asymptotic rate of convergence for DEA estimates, while Gijbels et al. (1999) derived the asymptotic distribution for DEA scores involving a single output and a single input. Kneip et al. (2003) extended Gijbels’ findings to the multiple input, multiple output case, while Jeong (2004) derived a more empirically tractable version of Kneip, Simar and Wilson’s distribution. All of this work depends on specific regularity conditions, while none of this work deals with the empirically realistic case of relatively small finite population sizes and random sampling of DMUs without replacement.
Jeong’s (2004) work notwithstanding, the asymptotic distributions derived in past studies are not easily implemented because they do not belong to traditional parametric families, and nonparametric methods, complex numerical integration, or other approximation techniques are generally required in order to construct confidence intervals for DEA scores. Another shortcoming is that the assumptions necessary to derive these distributions do not apply to a wide range of empirical DEA studies. For example, virtually all studies assume that the inputs and outputs are iid random vectors. However, if researchers are sampling from finite populations without replacement, this assumption cannot hold. Additionally, studies such as Gijbels et al., 1999, Kneip et al., 2003 assume that the production frontier is smooth and twice continuously differentiable. This assumption would be inappropriate for certain types of production technologies where outputs and/or inputs are non-substitutable (i.e., a Leontief technology), or exhibit changing degrees of substitutability (Arnade and Pick, 2000, Zofio and Prieto, 2007, Antony, 2010). In the absence of these assumptions the distributions and rates of convergence identified in these studies will generally not apply.
In this paper we treat the incidence6 of inefficiency in a population of DMUs as a latent variable and the sample DEA estimates as a collection of sample observations that provide useful but noisy information relating to the value of the latent variable. Together with a prior distribution relating to the incidence of inefficiency, which can be either informative or uninformative, we derive a posterior distribution for the incidence of inefficiency that accounts for the noise in the latent variable observations. Empirical characterizations of the posterior distribution can then be used to generate credible region (or Bayesian confidence interval) estimates. The approach places little a priori structure on the nature of the production process being studied so that inferences are applicable in very general problem contexts such as cases where the efficient frontier is not (twice) continuously differentiable or even in cases where the technology is not representable via a parametric functional form. Moreover, the geneses of the probability distributions used in the statistical model emanate directly from the sampling methods and the prior information employed, and thus errors in distributional specification are fully mitigated.
The remainder of this paper proceeds as follows. First, we identify an appropriate posterior probability distribution for the true incidence of inefficiency in a finite population of DMUs, based on random samples without replacement of DEA estimates under the idealized assumption that sample outcomes categorize DMUs correctly as to whether they are actually inefficient. While the empirical relevance of this model is limited, analysis of this case provides results that are relevant to the existing DEA literature, and allows the analysis to focus initially on only the issue of sampling variability, and abstract from the issue of mismeasurement in the DEA information. We next analyze the empirically more relevant case where the DEA estimates are suspected of not necessarily characterizing DMU inefficiency accurately. A posterior distribution for the incidence of inefficiency that accounts for both sampling variability and potential mismeasurement in the DEA estimates is derived. We then provide both a numerical example and an application that illustrate the implementation of the methodology. The final section of the paper discusses implications of our results for the DEA literature and postulates some suggestions for future research.
Section snippets
General problem context
Consider a population of N DMUs all of which operate in the same industry. Each of these DMUs utilizes a p × 1 vector of inputs to produce a q × 1 vector of outputs. Following the notation of Fare and Primont, 1995, Simar and Wilson, 2007, let denote the vector of inputs and denote the vector of outputs. The conceptual set of feasible production possibilities, which subsumes all of the production technologies of the DMUs in the population under study, is defined by:
Inferring the incidence of non-latent inefficiency
In this section we operate within an idealized setting where DEA estimates correctly categorize any DMU in the sample as to whether it is efficient relative to the population production frontier. While unrealistic in actual applied settings, this idealization, which makes the incidence of inefficiency non-latent, provides the foundation on which we later build the method for analyzing the incidence of unobservable latent inefficiency.
Given that random sampling is without replacement, the
Inferring the incidence of latent inefficiency
In empirical analyses, the production frontier is generally unknown, and using the sample-based DEA procedure to categorize DMUs as inefficient results in a potentially downward-biased count of inefficient firms in the sample which can subsequently bias the estimate of the proportion of inefficient firms in the population.
A simple numerical illustration
As a simple illustration of the preceding methodology, assume there are N = 150 DMUs in an industry under study, the DMUs produce q = 2 outputs, and the technology for producing those outputs is categorized by m = 8 distinct rays. The assumed characteristics of the population, including the number of efficient and inefficient firms along each ray, are displayed in Table 1.
Conclusions
In this paper, we used Bayesian methods to derive appropriate posterior distributions for the incidence of inefficient firms in an industry, based on information calculated by DEA when sampling from a finite population. Through interpreting the true incidence of inefficiency as a latent variable, the noise and potential mismeasurement of efficiency inherent in the DEA scores can be explicitly integrated into the analysis and properly accounted for in the definition of posterior distributions.
References (40)
- et al.
The utility of returns to scale in DEA programming: an analysis of Michigan rural hospitals
European Journal of Operational Research
(2005) - et al.
Measuring the efficiency of decision making units
European Journal of Operational Research
(1978) - et al.
DEA model with shared resources and efficiency decomposition
European Journal of Operational Research
(2010) Evaluating physician efficiency in hospitals: a multivariate analysis of best practices
European Journal of Operational Research
(1995)- et al.
The impact of strategic management and fleet planning on airline efficiency: a random effects Tobit model based on DEA efficiency scores
Transportation Research Part A: Policy and Practice
(2011) - et al.
Employing super-efficiency analysis as an alternative to DEA: an application in outpatient substance abuse treatment
European Journal of Operational Research
(2009) - et al.
An alternative approach to monetary aggregation in DEA
European Journal of Operational Research
(2010) - et al.
Estimation and inference in two-stage, semiparametric models of production processes
Journal of Econometrics
(2007) Trends in relationship lending and factors affecting relationship lending efficiency
Journal of Banking and Finance
(2002)- et al.
Exploring output quality targets in the provision of perinatal care in England using data envelopment analysis
European Journal of Operational Research
(1995)
A Bayesian approach to statistical inference in stochastic DEA
Omega
A class of changing elasticity of substitution production functions
Journal of Economics
Seasonal oligopoly power: the case of the US fresh fruit market
Applied Economics
Maximum likelihood, consistency and data envelopment analysis: a statistical foundation
Management Science
An Introduction to Efficiency and Productivity Analysis
The impact of financial incentives on physician productivity in medical groups
Health Services Research
The coefficient of resource utilization
Econometrica
Multi-Output Production and Duality: Theory and Applications
The measurement of productive efficiency
Journal of the Royal Statistical Society, Series A
Cited by (16)
Factor-analysis-based directional distance function: The case of New Zealand hospitals
2021, Omega (United Kingdom)Citation Excerpt :Capital is often more challenge to measure due to the lack of data to separate the flow of capital services from capital stock. The number of installed beds is a common proxy variable for capital input [2,3,10,14,30,32,33,58]. Unfortunately, that information for NZ DHB was not consistently collected.
Combining stochastic DEA with Bayesian analysis to obtain statistical properties of the efficiency scores: An application to Greek public hospitals
2015, European Journal of Operational ResearchCitation Excerpt :Two recent studies have used Bayesian methods to account the sampling error in DEA. Friesner, Mittelhammer and Rosenman, (2013) used Bayesian methods to derive appropriate posterior distributions for the incidence of inefficient production units, based on information calculated by DEA when sampling from a finite population. Their method interprets the true incidence of inefficiency as a latent variable and as a result the noise and potential mismeasurement of efficiency inherent in the DEA scores can be explicitly integrated into the analysis and properly accounted for in the definition of posterior distributions.
Do adjustment costs constrain public healthcare providers’ technical efficiency? Evidence from the New Zealand Public Healthcare System
2024, Health Care Management ScienceAn alternative Bayesian data envelopment analysis approach for correcting bias of efficiency estimators
2023, Journal of the Operational Research SocietyNew Bayesian estimators for the incidence of inefficiency in the cases without a rule of thumb
2022, Journal of Statistical Computation and Simulation