1 Introduction

DEA is originally from the seminal work of Farell (1957) and is first introduced by Charnes et al. (1978). It evaluates the performance of decision making units (DMUs) by considering multiple dimensional inputs and outputs simultaneously without providing any prior weights or the special functional form for the production function. DEA has been demonstrated to be a highly effective tool for performance evaluation and benchmarking, and a number of extensions (Andersen and Petersen 1993; Banker et al. 1984; Branda and Kopa 2014; Cooper et al. 2000) and wide applications (Bergendahl 1998; García-Sánchez et al. 2013; Sahoo et al. 2011; Zhong et al. 2011) have been introduced.

Howerver, the nature of DEA makes the number of efficient DMUs be strongly influenced by the number of variables (inputs and outputs) (Cinca and Molinero 2004). Generally, the greater number of variables a DEA model has, the more efficient DMUs would be, and thus the less discerning of DEA analysis is Jenkins and Anderson (2003). Therefore, it is necessary in order to improve the discriminatory ability of DEA. To this end, a few studies have attempted to deal with the problem. Principal component analysis (PCA), which is a dimensions reduction methodology by reducing a relatively large number of variables to uncorrelated linear combinations of original inputs and outputs, is utilized to improve the discrimination of DEA with a minimal loss of information. This is the so-called PCA-DEA approach (Adler and Golany 2002; Shanmugam and Johnson 2007). Similar approaches have been applied to measure deregulated airline networks (Adler and Golany 2001), to evaluate the airport quality (Adler and Berechman 2001), to choose DEA specifications and rank units (Cinca and Molinero 2004), and to increase the discriminatory power of DEA in the presence of the undesirable outputs (Liang et al. 2009). Subsequently, a multivariate statistical approach was introduced by Jenkins and Anderson (2003), which deleted some variables that were highly correlated with the retained ones. They concluded that a major impact of the calculated DEA efficiencies could be had even deleting highly correlated variables [see also in Dyson et al. (2001)]. Accordingly, they utilized the partial covariance analysis instead of simple correlation to choose a subset of variables with minimizing information reduction. In other words, the majority of information included in the original data matrices could be offered in this way. This is the so-called variable reduction (VR) based on partial covariance analysis. Adler and Yazhemsky (2010) applied Monte Carlo simulation to compare the PCA-DEA and VR methodologies, and demonstrated that the former provided a more powerful tool than the latter with consistently more accurate results. Since these researchers improve the discriminatory power of DEA by reducing the dimensionality of variables, some valuable information would be accordingly lost no matter how prudent they are. Therefore, this paper proposes a Renyi’s entropy-DEA approach to improve the discriminatory ability of DEA in DEA structure without losing any variables information and without requiring any additional preferential information or assumption. Renyi’s entropy (Principe 2010; Renyi 1961) refers to measure diversity, uncertainty, or randomness of a system. The parametric family of entropies was introduced by Alfred Renyi in the mid 1950s as a mathematical generalization of Shannon entropy. Renyi wanted to find the most general class of information measure that preserved the additivity of statistically independent systems and were compatible with Kolmogorov’s probability axioms. Xie et al. (2014) used Shannon entropy to increase the discriminatory power of DEA, and Shannon entropy was a special case of Renyi’s entropy at \(\alpha =1\). Renyi’s entropy measure is much more flexible due to the parameter \(\alpha \), enabling several measurements of uncertainty (or dissimilarity) within a given distribution. In many cases, Renyi’s entropy is better than Shannon entropy, so it is of great significance to study the theoretical and practical performance. The larger value of Renyi’s entropy is, the more impurity of information has, and the more uncertainty of nodes would be. Based on the Renyi’s entropy, we can easily get the purity of information (PI). The more uncertainty of nodes is, the less importance of nodes has, and thus the smaller PI value would be.

Fig. 1
figure 1

The causal relationships among of the number of variables, efficient DMUs, the uncertainty of variables, the discrimination of DEA and the value of PI

As depicted in Fig. 1, it is appropriate to use the PI to evaluate the importance of variables sets since it accords with the causal relationships between the number of variables and the DEA efficiencies, which is that an excessive number of variables can generally lead to more DEA efficient DMUs, and accordingly the uncertainty would increase. Besides, the PI can also be simply and quickly computed. Considering the advantages and characteristics of PI, we first construct a CART by listing all the possible variables subsets as the nodes and considering each DMU as a different classification. Subsequently, we can evaluate the DEA efficiencies under each node (i.e., each subset of variables) respectively and then scientifically obtain the corresponding PI. Finally, we can generate a comprehensive efficiency scores (CES) by integrating the importance of nodes, which can be normalized from the PI, into the corresponding efficiencies. The process does not require any additional preferential information or assumption, and furthermore, the proposed approach can noticeably increase the discrimination strength of DEA without losing valuable information.

The remainder of this study is outlined as follows: Sect. 2 presents the Renyi’s entropy-DEA approach and an accelerating procedure after the problem formulation. Then the proposed approach is implemented on two examples to evaluate the performance in Sect. 3. Conclusions are given in the end.

2 Methodology formulation

2.1 Problem formulation

DEA is a non-parametric methodology to evaluate the relative efficiency of DMUs with multiple inputs and outputs. The discrimination problem emerges when there are a relatively large number of inputs and outputs related to the number of DMUs, and some inefficient DMUs might be incorrectly classified as efficient in this situation (Adler and Yazhemsky 2010), which directly weakens the ability of DEA. To avoid the curse of dimensionality, Friedman (1998) suggested that the number of variables should be less than a third of the number of DMUs. Simar and Wilson (2000) proposed that the number of observations (i.e., DMUs) should increase exponentially with the addition of variables, however, the precise convergence of non-parametric estimators relies on not only the number of variables relative to the number of observations but also the unknown smoothing constants (Kneip et al. 1998; Simar and Wilson 2000), it is accordingly impossible to general statements about the number of observations required to accomplish a given level of mean-square error. Unfortunately, the large number of DMUs may be usually not available in practice. Moreover, all of factors influencing on the production process could be considered as the potential variables, and it is hard for the analysts to make the best choice to select due to the limited rationality of human beings (Pastor et al. 2002). As a result, there are always an excessive number of variables associated with the number of DMUs in practice. Therefore, it urgently raises the need for discrimination increasing methodologies. Furthermore, we consider particularly that variables should not be deleted in the methodologies since some unknown but significant information may be lost and no assumption should be made to keep the proposed approach meaningful.

2.2 The Renyi’s entropy-DEA approach

Suppose there is a set of DMUs, and each \(DMU_j,j \in S = \{ 1,2,\dots ,n\}\) consumes m inputs \(x_{ij}^{}(i \in I = \{ 1,2,\dots ,m\} )\) to yield s outputs \(y_{rj}^{}(r \in O = \{ 1,2,\dots ,s\} )\), then based upon the DEA model, the efficiency score for any given \(DMU_0\) under evaluation can be calculated as follows:

$$\begin{aligned} \begin{array}{l} \max \sum \limits _{r = 1}^s {{\mu _r}{y_{ro}}} + \mu = E_0^{}(I,O)\\ s.t. ~\sum \limits _{r = 1}^s {{\mu _r}{y_{rj}}} - \sum \limits _{i = 1}^m {{\upsilon _i}{x_{ij}}} + \mu \le 0,\quad \forall j \in S\\ ~~~~~\sum \limits _{i = 1}^m {{\upsilon _i}{x_{io}}} = 1\\ ~~~~~{\upsilon _i},{\mu _r} \ge 0,\quad \forall i \in I,\quad r \in O \end{array} \end{aligned}$$
(1)

where \(E_0^{}(I,O)\) is the relative efficiency of \(\textit{DMU}_0\), and \({\mu _r},{\nu _i}\) are unknown weights attached to the rth output and ith input, respectively. Model (1) allows \(DMU_0\) discretionarily choosing favorite weights to maximize its optimal efficiency. Therein, \(\mu \) is not determinated: model (1) is a standard input-oriented CCR model if \(\mu = 0\) (Charnes et al. 1978); otherwise it is a standard input-oriented BCC model when \(\mu \) is free (Banker et al. 1984). The discrimination problem exists in all of DEA models.

Theoretically, a DEA model at least has one input and one output (Wagner and Shimshak 2007). Denote \(K = (2_{}^m - 1) \times (2_{}^s - 1)\) as the number of all different combinations of input subsets and output subsets, which are from input M and output S respectively. \(M_k\) is the \(k^{th}\) combination of variables subsets for a DEA model, and the whole variables subsets is denoted as \(\Omega = \{ {M_1},{M_2},\dots ,{M_K}\}\). The efficiency score of \(\textit{DMU}_j\) based upon \(M_k\) is \(E_{kj},j = 1,\dots ,n, k = 1,\dots ,K\). If we solve the model K times, once with an alternative combination of variables subsets, then we can obtain an efficiency score matrix \({[{E_{jk}}]_{n \times K}}\) as follows:

$$\begin{aligned} \begin{array}{*{20}{c}} {}&{\begin{array}{*{20}{c}} {{M_1}}&{}\quad {{M_2}}&{}\quad {\cdots }&{}\quad {{M_K}} \end{array}}\\ {\begin{array}{*{20}{c}} {\textit{DMU}_1}\\ {\textit{DMU}_2}\\ \vdots \\ {\textit{DMU}_n} \end{array}}&{}{\left[ {\begin{array}{*{20}{c}} {{E_{11}}}&{}\quad {{E_{12}}}&{}\quad {\cdots }&{}\quad {{E_{1K}}}\\ {{E_{21}}}&{}\quad {{E_{22}}}&{}\quad {\cdots }&{}\quad {{E_{2K}}}\\ \vdots &{}\quad \vdots &{}{}&{}\quad \vdots \\ {{E_{n1}}}&{}\quad {{E_{n2}}}&{}\quad {\cdots }&{}\quad {{E_{nK}}} \end{array}} \right] .} \end{array} \end{aligned}$$
(2)

Considering each variable subset and \(\textit{DMU}\) as a node and classification respectively, we can construct a CART as decipted in Fig. 2. Apparently, if the DMUs’ efficiencies under two nodes are different, the impurity of information existing in the two nodes would be also different. To scientifically evaluate the impurity of information of nodes after obtaining the DEA efficiencies, Renyi’s entropy (Principe 2010; Renyi 1961) is introduced. Renyi’s entropy is a value for the measuring the impurity of information, and the mathematical definition can be shown as follows.

Definition 1

Renyi’s entropy is defined by \(G = \frac{1}{{1 - \alpha }}\log \sum _{j = 1}^n {p_j^\alpha },\) and \(p_j^{} = {E_j} /{\sum _{j = 1}^n {E_j} }\) represents the proportion of \(\textit{DMU}_j\)’s efficiency in the node.

It is clear that the larger value of Renyi’s entropy is, the more impurity of information has, and the more uncertainty of nodes would be. In this study, we set \(\alpha =2\) and define the corresponding PI of nodes as follows.

Definition 2

The PI of nodes can be defined by \(d = 1 + \log \sum _{j = 1}^n {p_j^2}\).

Fig. 2
figure 2

Constructing the CART of variables subsets

Correspondingly, the more uncertainty of nodes has, the less discrimination has, and the smaller PI would be. For a given model generating efficiencies of all DMUs under the nodes \({M_k}, (k = 1,2,\dots ,K)\), we can obtain the following useful corollaries.

Corollary 1

If there is only one DMU in nodes, then the minimum uncertainty of nodes and the maximum PI would be obtained.

Proof

When there is only one DMU in nodes, we can easily obtain \(G = 0\) and \(d = 1\) from Definitions 1 and 2, respectively. \(\square \)

Corollary 2

A small fluctuation of efficiencies would generate a small PI for the node. Particularly, the maximum uncertainty of nodes and the minimum PI are obtained when the efficiencies are equal.

Proof

Assume that \(p_j^{2} = \frac{1}{n} + {\varepsilon _j}, \forall j \in S\), then the Renyi’s entropy is

$$\begin{aligned} {G_b}&= - \log \sum _{j = 1}^n {p_j^{2}} \\&= - \log \left[ {\left( \frac{1}{n} + {\varepsilon _1}\right) ^2} + {\left( \frac{1}{n} + {\varepsilon _2}\right) ^2} + \cdots + {\left( \frac{1}{n} + {\varepsilon _n}\right) ^2}\right] \\&= - \log \left[ \frac{1}{n} + \frac{2}{n}\left( {\varepsilon _1} + {\varepsilon _2} + \cdots + {\varepsilon _n}\right) + \left( \varepsilon _1^2 + \varepsilon _2^2 + \cdots + \varepsilon _n^2\right) \right] . \end{aligned}$$

From \(0 \le \frac{1}{n} + {\varepsilon _j} \le 1,\) we have \(- \frac{1}{n} \le {\varepsilon _j} \le 1 - \frac{1}{n}.\) Since \(\sum _{j = 1}^n {p_j^{2}} = 1,\) we can obtain \({\varepsilon _1} + {\varepsilon _2} + \cdots + {\varepsilon _n} = 0.\) Besides

$$\begin{aligned} \varepsilon _1^2 + \varepsilon _2^2 + \cdots + \varepsilon _n^2&\ge \frac{2}{{n - 1}}\big ({\varepsilon _1}{\varepsilon _2} + {\varepsilon _1}{\varepsilon _3} + \cdots + {\varepsilon _{n - 1}}{\varepsilon _n}\big )\\&\ge \frac{2}{{n - 1}} \times \frac{{C_n^2}}{{{n^2}}} \\&= \frac{1}{n}. \end{aligned}$$

Hence \(G_b= -\log [\frac{1}{n} + (\varepsilon _1^2 + \varepsilon _2^2\cdots + \varepsilon _n^2)] \le - \log \frac{1}{n},\) and the PI in this case is \({d_b} = 1+ \log [\frac{1}{n} + (\varepsilon _1^2 + \varepsilon _2^2\cdots + \varepsilon _n^2)].\) Therefore, it is clear that \({d_b}\) is directly influenced by the variables \({\epsilon _j}, j \in S\) and a small fluctuation of efficiencies would generate a small PI for the node. In particular, if \({\epsilon _j}=0, \forall j \in S\), i.e., efficiencies \(p_j^{2} = \frac{1}{n}, \forall j \in S\) in nodes, the Renyi’s entropy \({G_b} = -\log \frac{1}{n},\) and the PI is \({d_b} = 1 + \log \frac{1}{n}\). It is easy to check that the maximum \({G_b}\) and the minimum PI \({d_b}\) are obtained.

Finally, we propose the Renyi’s entropy-DEA approach, which does not require any additional information or assumption, to improve the discrimination strength of DEA without losing any valuable information. The detailed scheme is stated in the following Algorithm 1.

figure a

Remark 1

For a given model generating efficiencies under all of nodes, if a DMU’s efficiencies under all of nodes are unchanged, then its CES would be unchanged. It further illustrates that our approach is effective and reasonable.

Remark 2

Users can add sufficient variables to better characterize the real production process without worrying the discrimination problem, and the results will be more accurate without losing any information.

Because a variables subset containing the greater number of variables could generally cause the more efficient DMUs, and accordingly lead to the smaller important degree of the subset. Extremely, if all of DMUs are efficient under some variables subset, then the minimum weight of the subset would be generated. Remark 2 guarantees the results of evaluation performance can be more objective, scientific and persuasive. It should also be noted that if the data set only has one input and one output, the proposed method is equivalent to the traditional models, and the proposed method in this section does not rely on the particular form of the DEA model. This method can be used with either constant or variable returns to scale, or with either an input or an output orientation.

2.3 An accelerating procedure

Since a small added number of variables would rapidly increase the number of variables subsets, we provide an accelerating procedure to overcome this limitation and present the pseudocode in the Algorithm 2. This Algorithm regards the inputs data I and the outputs data O as inputs, and returns the CES for all DMUs. Therein, AllVarSubsets lists all possible variables subsets and saves the result in nodes. Considering the process of generating the PIs under all of nodes are the same, we introduce the parallel computing to accelerate the multi-nodes DEA computations. The parallel computing here is based on the full use of the resources of multicore processors. At this stage of the multicore era, serial architectures cannot sufficiently take advantage of multicores. Programmers should adjust the codes with the concept of parallelism to decompose a large-scale computing task into multiple relatively small execution units and run them on parallel architectures. FastDEA is a procedure to further speed up the large-scale DEA computations, and have been extensively studied in Barr and Durchholz (1997), Chen and Cho (2009), and Dul (2008). FastDEA, Normalized and PIDefinition in the loop correspond to from step 1 to 3 in Algorithm 1, respectively. The PIs of all of nodes for a given DEA model are returned by the loop. After obtaining the important degrees \(W^k\) by taking the \(\mathrm {PI^k}\) into Normalized (the step 4 in Algorithm 1), the CES can be generated by the WeightedSum (the step 5 in Algorithm 1).

figure b

3 Empirical application

3.1 A simple data set

Table 1 shows a simple data set from Liang et al. (2008), which has five DMUs with three inputs and two outputs. Obviously, the number of variables is equal to the number of DMUs. But the number of variables should be less than one third of the number of DMUs according to the guideline. Therefore, as in the last example, the discrimination of the traditional DEA model may be reduced (Jenkins and Anderson 2003). In the following description, we use this example to demonstrate the proposed method. The following results of applying this data set are based upon the input-oriented, constant return to scale DEA models.

Table 1 Characteristics of the data set.

Table 2 shows that the traditional CCR model, the SuperCCR model, and the SBM model are unable to provide a full ranking for all departments, and the results of all models report that \(\textit{DMU}_3\) ranks first. Both CCR and SBM efficiencies indicate that \(\textit{DMU}_2\) and \(\textit{DMU}_3\) are DEA efficient and should be arranged in the same order, while the results of the other three models show that \(DMU_2\) behaves poorly than \(\textit{DMU}_3\). As a matter of fact, in the process of calculating the efficiencies of all DEA specifications, we find \(\textit{DMU}_3\) is efficient based upon 20 DEA models, and for \(\textit{DMU}_2\), it is efficient based upon 12 DEA models, while the other nine models will lead to its inefficiency. Therefore, the ranking of \(\textit{DMU}_2\) and \(\textit{DMU}_3\) is accurate when using Renyi’s entropy-DEA approach, but CCR and SBM efficiencies cannot distinguish some efficient DMUs.

Table 2 Efficiencies based upon different DEA models

In addition, both CCR and SuperCCR efficiencies show \(DMU_4\) and \(DMU_5\) have a same efficiency score, they are poor in distinguishing some inefficient DMUs. And here we can find that the result of GCE also has a high discrimination power. Because GCE is calculated under the same individual variable set with three inputs and two outputs. We consider that a special variable set is not sufficient to represent the actual performance of DMUs. The CES can be obtained by integrating the efficiencies of all subsets. Therefore, the CES seems more comprehensive and more representative than the GCE. Besides, the CES of proposed model can also provide a clear ranking when \(\alpha \) in the Renyi’s entropy is changed. As shown in Table 3, the DMUs’ ranking R remains stable when \(\alpha >5\) in this example. Furthermore, the sensitivity of the efficiencies of each DMU can be observed according to the different \(\alpha \).

Table 3 The change of CES and ranking when changing the value of \(\alpha \)

3.2 University

In this subsection, we apply the proposed approach to a real data set of universities in China. As shown in Table 4, there are 16 DMUs, which represent 16 science and engineering universities collected from the basic situation statistics assembly of universities directly under the ministry of education in 2007. Here \(\textit{DMU}_j, j=1,2,\dots 16\) respectively represents Tsinghua University, Shanghai Jiao Tong University, Huazhong University of Science and Technology, Xi’an Jiaotong University, Tongji University, Southeast University, South China University of Technology, Dalian University of Technology, Northeastern University, East China University of Science and Technology, Wuhan University of Technology, Southwest Jiaotong University, University of Electronic Science and Technology of China, Xidian University, Hehai University and Hefei University of Technology. The same characteristic of these universities can keep the homogeneity of DMUs, which is a prerequisite to use DEA. Besides, the selected universities cover many typical regions, which can reflect the difference of the supporting for higher education from local governments and the work efficiencies of universities. Selecting appropriate variables is important, which directly influences the evaluation performance. Unfortunately, as mentioned in the section 2, there are always an excessive number of variables related to the number of DMUs in practice. Considering the main characteristics of higher education, the following variables are chosen.

Input 1: the number of faculty members (Unit: one people), which include teaching assistants, lecturers, associate professors and professors.

Input 2: the fixed assets (Unit: ten thousand RMB).

Input 3: the appropriations (Unit: ten thousand RMB), which contain educational appropriations and scientific and technological expenditure.

Output 1: the number of graduates (Unit: one people), which are composed of graduate students, undergraduate students and junior college students.

Output 2: the total number of papers (Unit: one paper). We set one English paper is equivalent to one and a half Chinese paper to emphasize the over-costs paid by Chinese scholars, such as the first language problem, the relatively long publishing period, and so on.

Output 3: the number of invention patents.

Some other variables may be omitted even though we consider many indicators to characterize the main process of higher education. From Remark 2, adding the omitted variables can provide a more precise result, and has no influence on illustrating the proposed approach. Obviously, the number of variables is more than one third of the number of DMUs here, which lowers the discriminatory ability of DEA (Friedman 1998; Jenkins and Anderson 2003). It urgently brings about the requirements for discrimination improving methodologies. The relevant results are presented in Tables 4 and 5.

Table 4 Data of 16 universities in 2007 and the ranking results
Table 5 CCR efficiencies, the important degrees of nodes and CES calculated by the Renyi’s entropy-DEA approach with \(\alpha =2\)

First, all possible variables subsets (nodes) are contained in the first column of Table 5, and the number “1” means that the corresponding variable is selected in the node, while “0” means that the variable is deleted. Then we choose the input-oriented CCR model to evaluate the performance of DMUs under the 49 nodes, and our approach can also be employed to improve the discriminatory power of other DEA models. The relevant results are shown in the second column of Table 5, where we can find that different variables subsets have the impact on the DEA efficiencies of DMUs. The second row from bottom line is the result under all variables, and there are 9 efficient DMUs, which means that the discrimination in this situation is very poor. The important degrees of nodes are shown in the last column of Table 5, where we can find that \(W_k\) is influenced by the fluctuation range of efficiencies, such as \(W_1=0.021\) with the fluctuation range of [0.545, 1], and \(W_6=0.018\) with the fluctuation range of [0.156, 1]. Finally, the CESs can be obtained by the weighted sum between \(W_k\) and the CCR efficiencies under the corresponding node, which are shown in the end row of Table 5. It is clear that no efficient DMU is generated by our approach. As shown in the last column of Table 4, rank A and B represent the rank of efficiencies in the second row from bottom line of Table 5 and CESs respectively. Comparing with the ranks, we can find that CESs can give a complete ranking in this case. In particular, the rank of some DMUs may be changed before and after our approach, such as \(\textit{DMU}_{10}\) and \(\textit{DMU}_{16}\). Apparently, our approach can noticeably improve the strength of DEA without losing variables information.

Furthermore, the accelerating procedure is applied to test its performance. The computing tool is Matlab 2010a, and the computing environment is Microsoft Windows XP with Intel Pentium dual CPU (E2200) and 2G RAM. Besides, there are some different approaches for FastDEA to solve a DEA computation with a large number of variables or DMUs under a certain variables set, and in this example there are only 16 DMUs with 6 variables (too many DMUs may decrease the effect of the illustrated example). Moreover, we are mainly to show the performance of parallel computing to accelerate the multi-subsets Renyi’s entropy-DEA approach, therefore, the FastDEA is replaced by the traditional DEA procedure here, and the computing speed is about 10.2 s (about 19.6 s) by using (not using) the accelerating procedure in our computer. It can be predictable that a higher speed would be reached if more cores are used.

4 Conclusions

Prior researches improve the discrimination of DEA by reducing the number of variables with minimizing information reduction. To avoid distorting the evaluation performance results, this paper presents an approach based on Renyi’s entropy to increase the discrimination of DEA without losing variables information and without requiring any additional preferential information or assumption. We first list all possible variables subsets and consider them as the nodes of CART, besides, each DMU is seen as a classification. Then we generate all DEA efficiencies under the nodes to further calculate the PI based on Renyi’s entropy. The important degrees can be obtained by normalizing the PI of each node, and the CES can be generated finally by the weighted sum of the important degrees and the corresponding DEA efficiencies. The process can reflect the whole variables information into the consequences. Moreover, an accelerating procedure is introduced to solve the multi-subsets problem in Renyi’s entropy-DEA approach. The procedure is primarily based on the concept of parallel computing, and can rapidly improve the computing speed. The two examples demonstrate that the proposed approach is valid and reasonable.

The proposed approach can be generally applied to the evaluation performance problem. In the future study, we believe that some technologies of data mining (Fayyad et al. 1996) could be introduced to overcome the drawbacks of DEA.