A robust optimization approach for imprecise data envelopment analysis

doi:10.1016/j.cie.2010.05.011

Computers & Industrial Engineering

Volume 59, Issue 3, October 2010, Pages 387-397

https://doi.org/10.1016/j.cie.2010.05.011 Get rights and content

Abstract

Crisp input and output data are fundamentally indispensable in traditional data envelopment analysis (DEA). However, the input and output data in real-world problems are often imprecise or ambiguous. Some researchers have proposed interval DEA (IDEA) and fuzzy DEA (FDEA) to deal with imprecise and ambiguous data in DEA. Nevertheless, many real-life problems use linguistic data that cannot be used as interval data and a large number of input variables in fuzzy logic could result in a significant number of rules that are needed to specify a dynamic model. In this paper, we propose an adaptation of the standard DEA under conditions of uncertainty. The proposed approach is based on a robust optimization model in which the input and output parameters are constrained to be within an uncertainty set with additional constraints based on the worst case solution with respect to the uncertainty set. Our robust DEA (RDEA) model seeks to maximize efficiency (similar to standard DEA) but under the assumption of a worst case efficiency defied by the uncertainty set and it’s supporting constraint. A Monte-Carlo simulation is used to compute the conformity of the rankings in the RDEA model. The contribution of this paper is fourfold: (1) we consider ambiguous, uncertain and imprecise input and output data in DEA; (2) we address the gap in the imprecise DEA literature for problems not suitable or difficult to model with interval or fuzzy representations; (3) we propose a robust optimization model in which the input and output parameters are constrained to be within an uncertainty set with additional constraints based on the worst case solution with respect to the uncertainty set; and (4) we use Monte-Carlo simulation to specify a range of Gamma in which the rankings of the DMUs occur with high probability.

Introduction

DEA is a methodology for evaluating and measuring the relative efficiencies of a set of decision making units (DMUs) that use multiple inputs to produce multiple outputs. The DEA method is based on the economic notion of Pareto optimality, which states that a DMU is considered to be inefficient if some other DMUs can produce at least the same amount of output with less of the same input and not more of any other inputs. Otherwise, a DMU is considered to be Pareto efficient. Due to its solid underlying mathematical basis and wide applications to real-world problems, much effort has been devoted to the DEA models since the pioneering work of Charnes, Cooper, and Rhodes (1978).

In the conventional DEA, all the data assume the form of specific numerical values. However, the observed values of the input and output data in real-life problems are sometimes imprecise or vague. The imprecise or vague data in the DEA models have been examined in the literature in different ways. Some DEA applications propose the exclusion of the units that have imprecise or vague values from the analysis (O’Neal, Ozcan, & Yanqiang, 2002). This approach is not suitable for DEA as it affects the efficiency of the other DMUs due to the comparative evaluation which may possibly disturb the statistical properties of the relative efficiencies of the DMUs (Simar & Wilson, 2000). Other approaches use imputation techniques to estimate the exact approximations of the imprecise or vague values. The imputation techniques used in DEA may lead to misleading efficiency results because of the stability problems where a unit accepting an infinitesimal perturbation may change its classification from an efficient to an inefficient status or vice versa (Cooper, Seiford, & Tone, 1999).

The stochastic approach is also used to model uncertainty in the DEA literature. This approach involves specifying a probability distribution function (e.g., normal) for the error process (Sengupta, 1992). However, as pointed out by Sengupta (1992), the stochastic approach has two drawbacks:

(a)
Small sample sizes in DEA make it difficult to use stochastic models.
(b)
In stochastic approaches, the decision maker is required to assume a specific error distribution (e.g., normal or exponential) to derive specific results. However, this assumption may not be realistic because on an a priori basis there is very little empirical evidence to choose one type of distribution over another.

More recently, the imprecise or vague data are expressed by two approaches; the interval DEA first proposed by Cooper, Park, and Yu (1999) and the fuzzy DEA first proposed by Sengupta (1992). Cooper, Park, et al. (1999) has developed an interval approach that permits mixtures of imprecise and precise data by transforming the DEA model into an ordinary linear programming (LP) form. One of the difficulties in the interval approach is the evaluation of the lower and upper bounds of the relative efficiencies of the DMUs. In spite of this difficulty, several researchers have proposed different variations of the interval approach (Despotis and Smirlis, 2002, Entani et al., 2002, Kao, 2006, Kao and Liu, 2000, Wang et al., 2005). Despotis and Smirlis (2002) have developed an interval approach for dealing with imprecise data in DEA by transforming a non-linear DEA model to an LP equivalent. The upper and lower bounds for the efficiency scores of the DMUS are defined. They use a post-DEA model and the endurance indices to discriminate among the efficient DMUs. They further formulate another post-DEA model to determine input thresholds that turn an inefficient DMU into an efficient one.

The concerns related to the lack of robustness of the efficiency frontier and the probabilistic feasibility of the inequality constraints in DEA motivated Sengupta (1992) to propose a fuzzy approach and use a fuzzy linear programming transformation as a viable approach in such situations. In the fuzzy approach, several fuzzy mathematical programming approaches are proposed such as possibilistic programming and α-cut approaches to assess the relative efficiency of the DMUs (Guo and Tanaka, 2001, Lertworasirikul et al., 2003, León et al., 2003, Saati et al., 2002). However, sometimes the complexity of the fuzzy approach can grow exponentially. Soleimani-damaneh, Jahanshahloo, and Abbasbandy (2006) have addressed the pitfalls of some fuzzy DEA models in the literature.

Lertworasirikul et al. (2003) have proposed a possibility approach to the treatment of various fuzzy DEA models. However, Soleimani-damaneh et al. (2006) showed that their model results in unbounded optimal values and has limited applicability in real-world problems. In another paper, Guo and Tanaka (2001) introduced an α-cut based approach that changed a fuzzy DEA model to a bi-level LP model. Soleimani-damaneh et al. (2006) showed that their model cannot be generalized as the provided model has an optimal solution under a specific restrictive condition. In spite of the concerns raised by Guo and Tanaka, 2008, Soleimani-damaneh et al., 2006 used the fuzzy DEA model proposed by Guo and Tanaka (2001) and introduced a fuzzy aggregation framework for integrating multiple attribute fuzzy values. Furthermore, Guo (2009) used the model proposed by Guo and Tanaka, 2001, Guo and Tanaka, 2008 in a case study for a restaurant location problem in China. Kao and Liu (2000) proposed a technique which transforms a fuzzy DEA model into a family of crisp DEA models by applying the α-cut approach. Their technique requires solving multiple LP problems to approximate the membership function of the efficiency measure and to assess a DMU. Soleimani-damaneh et al. (2006) show their model is computationally expensive. This considerable shortcoming holds for some other fuzzy DEA models (Guo and Tanaka, 2001, Jahanshahloo et al., 2004, León et al., 2003).

Liu (2008) developed a fuzzy DEA model to find the efficiency measures embedded with the assurance region (AR) concept. He applied an alpha-cut approach and Zadeh’s extension principle to transform the fuzzy DEA/AR model into a pair of parametric mathematical programs in order to work out the lower and upper bounds of the efficiency scores of the DMUs. The membership function of efficiency was approximated by using different possibility levels. Jahanshahloo, Sanei, Rostamy-Malkhalifeh, and Saleh (2009) commented on the fuzzy DEA model proposed by Liu (2008) and corrected the proof of his theorem. Liu and Chuang (2009) further used the fuzzy DEA/AR model suggested by Liu (2008) to evaluate the performance of 24 university libraries in Taiwan. Soleimani Damaneh (2008) used a fuzzy signed distance and fuzzy upper bound concepts to formulate a fuzzy additive model in DEA with fuzzy input–output data. Soleimani-damaneh (2009) put forward a theorem on the fuzzy DEA model proposed by Soleimani Damaneh (2008) to show the existence of a distance-based upper bound for the objective function of the model. Hatami-Marbini and Saati (2009) proposed a fuzzy DEA model to assess the efficiency scores in fuzzy environments. They applied the proposed fuzzy number ranking method proposed by Asady and Zendehnam (2007) and obtained the precise efficiency scores at sixteen bank branches in Iran. Wang, Luo, and Liang (2009) proposed two fuzzy DEA models with fuzzy inputs and outputs by means of fuzzy arithmetic. They converted each proposed fuzzy model into three linear programming models in order to calculate the efficiencies of the DMUs as fuzzy numbers and rank them. Although these studies have made great strides in DEA research, none of them address the gap in the imprecise DEA literature for problems not suitable or difficult to model with interval or fuzzy representations.

In this paper, we propose a robust optimization method for dealing with data uncertainties that covers the interval approach results with less complexity than the fuzzy approach. This method is based on the adaptation of recently developed robust optimization approaches proposed by Ben-Tal and Nemirovski, 2000, Bertsimas et al., 2004. Robust optimization was first introduced by Soyster (1973) who discussed, in a very specific setting, uncertain hard constraints in linear programming models. This topic was widely discussed successively by Ben-Tal and Nemirovski, 1998, Ben-Tal and Nemirovski, 1999 who proved, in relation to some specified uncertain data sets, that the robust counterpart of convex programming is a computationally solvable optimization problem. An additional attempt was taken by Bertsimas and Sim, 2004, El-Ghaoui and Lebret, 1997, El-Ghaoui et al., 1998 to further develop a theory for robust optimization. Sadjadi and Omrani (2008) have proposed a robust DEA model with consideration of uncertainty on output parameters for the performance assessment of electricity distribution companies. They show their robust DEA approach is a more usable method for ranking alternative strategies compared to the existing DEA methods.

This paper is organized into five sections. In Section 2, we present the fundamentals of the DEA model with precise and interval data. In Section 3, we illustrate the mathematical details of the proposed robust DEA framework. In Section 4, we demonstrate some attractive features of the proposed model with experimental results. Finally, in Section 5, we sum up our conclusions and future research directions.

Section snippets

The DEA model with precise and interval data

In this section, we present the basic concepts of the DEA model with precise and interval data. Let us assume that n DMUs convert m inputs into s outputs. The following procedure is used to obtain the relative efficiency of each DMU. Suppose x_ij (i = 1, … , m, j = 1, … , n) and y_rj (r = 1, … , s, j = 1, … , n) are the ith input ant the rth output of DMU_j, respectively. The relative efficiency of DMU_p, p ϵ {1, … , n}, is defined as the maximum value of θ_p and can be obtained by using the following linear programming (LP)

The robust DEA model

In this section, we present the mathematical details of the robust DEA model proposed in this paper. Let us consider the DMU_j and assume that $J_{j}^{x}$ and $J_{j}^{y}$ are the index sets of the imprecise input and output values, respectively. Let us further consider parameters $γ_{j}^{x}$ and $γ_{j}^{y}$ , not necessarily integer, that assume values in the bounded intervals $[0, | J_{j}^{x} |]$ and $[0, | J_{j}^{y} |]$ , where, |⋅| is the cardinal of a set. The role of the parameters $γ_{j}^{x}$ and $γ_{j}^{y}$ is to adjust the robustness of the proposed model

The experimental results

In this section, we will explain some attractive aspects of our framework through two examples. In the first example, we present a problem with five DMUs, one interval input and one interval output. Initially, we utilize a pictorial view of the problem followed by a solution based on (13) using the generalized algebraic modeling system (GAMS) for different combinations of Γ s. Then, we determine the optimal weight of each DMU and the overall ranking order of the DMUs based on the obtained θ_j