Abstract
This research intends to develop the classifiers for dealing with binary classification problems with interval data whose difficulty to be tackled has been well recognized, regardless of the field. The proposed classifiers involve using the ideas and techniques of both quantiles and data envelopment analysis (DEA), and are thus referred to as quantile–DEA classifiers. That is, the classifiers first use the concept of quantiles to generate a desired number of exact-data sets from a training-data set comprising interval data. Then, the classifiers adopt the concept and technique of an intersection-form production possibility set in the DEA framework to construct acceptance domains with each corresponding to an exact-data set and thus a quantile. Here, an intersection-form acceptance domain is actually represented by a linear inequality system, which enables the quantile–DEA classifiers to efficiently discover the groups to which large volumes of data belong. In addition, the quantile feature enables the proposed classifiers not only to help reveal patterns, but also to tell the user the value or significance of these patterns.
Similar content being viewed by others
References
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444.
Charnes, A., Cooper, W. W., Wei, Q. L., & Huang, Z. M. (1989). Cone ratio data envelopment analysis and multiobjective programming. International Journal of Systems Science, 20(7), 1099–1118.
Cooper, W. W., Park, K. S., & Yu, G. (1999). IDEA and AR-REA: Models for dealing with imprecise data in DEA. Management Science, 45, 597–607.
Cooper, W. W., Seiford, L. M., & Tone, K. (2006). Introduction to data envelopment analysis and its uses: With DEA-solver software and references. New York: Springer.
Corne, D., Dhaenens, C., & Jourdan, L. (2012). Synergies between operations research and data mining: The emerging use of multi-objective approaches. European Journal of Operational Research, 221, 469–479.
Despotis, D. K., & Smirlis, Y. G. (2002). Data envelopment analysis with imprecise data. European Journal of Operational Research, 140, 24–36.
Han, J., & Kamber, M. (2007). Data mining: Concepts and techniques. San Francisco: Morgan Kaufman Publishers.
Kao, C. (2006). Interval efficiency measures in data envelopment analysis with imprecise data. European Journal of Operational Research, 174, 1087–1099.
Pendharkar, P. C. (2002). A potential use of DEA for inverse classification problem. Omega: An International Journal of Management Science, 30, 243–248.
Pendharkar, P. C. (2011). A hybrid radial basis function and data envelopment analysis neural network for classification. Computers and Operations Research, 38, 256–266.
Pendharkar, P. C. (2012). Fuzzy classification using the data envelopment analysis. Knowledge Based Systems, 31, 183–192.
Pendharkar, P. C., Khosrowpour, M., & Rodger, J. A. (2000). Application of Bayesian network classifiers and data envelopment analysis for mining breast cancer patterns. The Journal of Computer Information Systems, 40(4), 127–132.
Pendharkar, P. C., & Troutt, M. D. (2011). DEA based dimensionality reduction for classification problems satisfying strict non-satiety assumption. European Journal of Operational Research, 212, 155–163.
Seifert, J. W. (2004). Data mining: An overview. CRS Report for Congress, The Library of Congress, Order Code RL31798. http://www.fas.org/irp/crs/RL31798.pdf.
Seiford, L. M., & Zhu, J. (1998). An acceptance system decision rule with data envelopment analysis. Computers and Operations Research, 25(4), 329–332.
Sinha, A. P., & Zhao, H. (2008). Incorporating domain knowledge into data mining classifiers: An application in indirect lending. Decision Support Systems, 46, 287–299.
Troutt, M. D., Rai, A., & Zhang, A. (1996). The potential use of DEA for credit applicant acceptance systems. Computers and Operations Research, 23(4), 405–408.
Wei, Q. L., & Yan, H. (2001). A method of transferring polyhedron between the intersection-form and the sum-form. Computers and Mathematics with Application, 41, 1327–1342.
Wei, Q. L., & Yu, G. (1997). Analyzing the properties of K-cone in generalized data envelopment analysis model. Journal of Econometrics, 80, 63–84.
Yan, H., & Wei, Q. L. (2000). A method of transferring cones of intersection-form to cones of sum-form and its applications in DEA models. International Journal of Systems Science, 31(5), 629–638.
Yan, H., & Wei, Q. L. (2011). Data envelopment analysis classification machine. Information Science, 181, 5029–5041.
Ying, M. Q., Xu, R. E., & Wei, Q. L. (1975). Stability of mathematical programming. Acta Mathematical Sinica, 18(2), 123–175.
Yu, G., Wei, Q. L., & Brockett, P. (1996). A generalized data envelopment analysis model: A unification and extension of existing methods for efficiency analysis of decision making units. Annals of Operations Research, 66, 47–89.
Zhu, J. (2003). Imprecise data envelopment analysis: A review and improvement with an application. European Journal of Operational Research, 144, 513–529.
Acknowledgments
This paper has benefited from the suggestions offered by the reviewers, and this assistance is gratefully acknowledged. In addition, the first and the third authors are partially supported by the National Natural Science Foundation of China, NNSF 71271208.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Theorem 1(i)
Theorem 1
Let \(L<\bar{\beta}<\hat{\beta},\) and
and
Then, \(T_{\hat{\beta}}\subset T_{\bar{\beta}}.\)
Proof
Since b j > a j , j = 1, …, n, if \(L<\bar{\beta} <\hat{\beta},\) then
It follows that if \(\sum\nolimits_{j=1}^{n}\lambda_{j}\geq1,\,\lambda_{j} \geq0,\,j=1,\ldots,n,\) then
Thus, if \(x\in T_{\hat{\beta}},\) then \(x\in T_{\bar{\beta}};\) that is, \(T_{\hat{\beta}}\subset T_{\bar{\beta}}.\) □
Appendix 2: Proof of Theorem 2
To prove Theorem 2, we first present the following two lemmas:
Lemma 1
If \(L<\bar{\beta}<\hat{\beta},\) and \(\tilde{x}\in T_{\hat{\beta}}=\left\{ x\left\vert \sum\nolimits_{j=1}^{n}x_{j}^{\hat{\beta}} \lambda_{j}\leq x,\,\sum\nolimits_{j=1}^{n}\lambda_{j}\geq1,\,\lambda_{j}\geq 0,\,j=1,\ldots,n\right. \right\}, \) then the optimal objective function value of the following linear program is less than one; i.e., \(\hat{\theta }( \bar{\beta}) <1.\)
Proof
Let \(T_{\bar{\beta}}=\left\{ x\left\vert \sum\nolimits _{j=1}^{n}x_{j}^{\bar{\beta}}\lambda_{j}\leq x,\,\sum\nolimits_{j=1}^{n}\lambda_{j} \geq1,\,\lambda_{j}\geq0,\,j=1,\ldots,n\right. \right\}. \) Since \(\tilde {x}\in T_{\hat{\beta}},\) there exist \(\tilde{\lambda}_{1},\,\tilde{\lambda} _{2},\ldots,\tilde{\lambda}_{n}\) that satisfy
Furthermore, since \(\sum\nolimits_{j=1}^{n}\tilde{\lambda}_{j}\geq1,\, ( \tilde{\lambda}_{1},\,\tilde{\lambda}_{2},\ldots,\tilde{\lambda} _{n}) \neq0.\) Moreover, since \(a_{j}<b_{j},\, 0<x_{j}^{\bar{\beta}}<x_{j}^{\hat{\beta}},\) j = 1, …, n. In summary,
It follows that there exist solutions to the following system of inequalities:
As a result, \(\hat{\theta}( \bar{\beta}) <1\) (i.e., the optimal objective function value of \(( P_{\bar{\beta}})\) is less than one). □
Lemma 2
If \(L<\bar{\beta},\, \hat{x}>0\) and \(\hat{x}\notin T_{\beta},\) then the optimal objective function value of the following linear program is greater than one; i.e., \(\hat{\theta}( \beta) >1.\)
Proof
Let \(\hat{\lambda}_{1},\,\hat{\lambda}_{2} ,\ldots,\hat{\lambda}_{n}\) denote the optimal solution to \(( P_{\beta }) \) and \(\hat{\theta}( \beta) =\hat{\theta}.\) If \(\hat{\theta}( \beta) =\hat{\theta}\leq1,\) then
That is, \(\hat{x}\in T_{\beta},\) which is a contradiction. □
In what follows, we give the proof to Theorem 2, first to (i) and then to (ii).
Theorem 2
Let \(\hat{x}\in\hat{T}\cap {\rm Int}\,\left\{ x|\sum\nolimits_{j=1}^{n}x_{j}^{L}\lambda_{j}\leq x,\,\sum\nolimits_{j=1}^{n}\lambda_{j} \geq1,\,\lambda_{j}\geq0,\,j=1,\ldots,n\right\},\) and \(\hat{\theta}( \beta) \) be the quantile function of DMU-\(\hat{x}.\) Then,
-
(i)
\(\hat{\theta}( \beta) \) is a continuous function defined over (L, +∞).
-
(ii)
\(\hat{\theta}( \beta) \) is a strictly monotonically decreasing function over (L, +∞).
Proof
-
(i)
Consider the following linear program \(( P_{\beta}):\)
$$ \begin{array}{lll} && \hat{\theta}( \beta) =\min\theta,\\ \left( P_{\beta}\right) & \hbox{s.t.}& \sum\limits_{j=1}^{n} x_{j}^{\beta}\lambda_{j}\leq\theta\hat{x},\\ && \sum\limits_{j=1}^{n}\lambda_{j}\geq1,\\ && \lambda_{j}\geq0,\quad j=1,\ldots,n. \end{array} $$Equivalently,
$$ \begin{array}{lll} && \hat{\theta}( \beta) =\min\theta,\\ \left( P_{\beta}\right) & \hbox{s.t.}& \sum\limits_{j=1}^{n}\left[ a_{j}+\beta\left( b_{j}-a_{j}\right) \right] \lambda_{j}\leq\theta\hat{x},\\ && \sum\limits_{j=1}^{n}\lambda_{j}\geq1,\\ && \lambda_{j}\geq0,\quad j=1,\ldots,n. \end{array} $$According to the stability of linear programming (Ying et al. 1975), the optimal objective function value of (P β ), \( \hat{\theta}(\beta), \) is a continuous function defined over (L, +∞).
-
(ii)
Let \(L<\bar{\beta}<\hat{\beta},\) and consider the following problem \(( P_{\hat{\beta}}):\)
$$ \begin{array}{lll} && \hat{\theta}( \hat{\beta}) =\min\theta,\\ \left( P_{\hat{\beta}}\right) & \hbox{s.t.}& \sum\limits_{j=1} ^{n}x_{j}^{\hat{\beta}}\lambda_{j}\leq\theta\hat{x},\\ && \sum\limits_{j=1}^{n}\lambda_{j}\geq1,\\ && \lambda_{j}\geq0,\quad j=1,\ldots,n. \end{array} $$It is clear that \(\hat{\theta}( \hat{\beta}) \hat{x}\in T_{\hat{\beta}}.\) Consider also the following problem \(( \tilde{P}_{\bar{\beta}}):\)
$$ \begin{array}{lll} && \hat{\theta}( \bar{\beta}) =\min\theta,\\ \left( \tilde{P}_{\bar{\beta}}\right) &\hbox{s.t.}& \sum\limits_{j=1}^{n}x_{j}^{\bar{\beta}}\lambda_{j}\leq\theta( \hat{\theta}( \hat{\beta}) \hat{x}), \\ && \sum\limits_{j=1}^{n}\lambda_{j}\geq1,\\ && \lambda_{j}\geq0,\quad j=1,\ldots,n. \end{array} $$
Let \(\tilde{\theta},\,\tilde{\lambda}_{1},\,\tilde{\lambda} _{2},\ldots,\tilde{\lambda}_{n}\) denote the optimal solution to \(( \tilde{P}_{\bar{\beta}}) .\) It is easy to check that \(\tilde{\theta} >0.\) Furthermore, since \(\hat{\theta}( \hat{\beta}) \hat{x}\in T_{\hat{\beta}}\) and \(L<\bar{\beta}<\hat{\beta},\) from Lemma 1, \(\tilde {\theta}<1.\) Moreover, since \(\hat{\theta}( \bar{\beta}) \) is the optimal objective function value of \(( \tilde{P}_{\bar{\beta}}) ,\, \hat{\theta}( \bar{\beta}) \leq\tilde{\theta}\hat{\theta }( \hat{\beta}) <\hat{\theta}( \hat{\beta}). \)□
Appendix 3: Existence of β*
The following Theorem 3 shows the existence of β*.
Theorem 3
Let \(\hat{x}\in\hat{T}\cap {\rm Int}\,\left\{ x|\sum\nolimits_{j=1}^{n}x_{j}^{L}\lambda_{j}\leq x,\,\sum\nolimits_{j=1}^{n}\lambda_{j} \geq1,\,\lambda_{j}\geq0,\,j=1,\ldots,n\right\},\) and \(\hat{\theta}( \beta)\) be the quantile function of DMU-\(\hat{x}.\) Then, there exists β* ∈ (L, +∞) such that the optimal objective function value of the following problem (P β ) is equal to one; i.e., \(\hat{\theta}(\beta^{\ast})=1.\)
Proof
-
(i)
If \(\hat{x}\) is located on the frontier of T 1, then \(\hat{\theta}( 1) =1,\) i.e., β* = 1.
-
(ii)
If \(\hat{x}\) is not located on the frontier of T 1, and \(\hat{x}\in{\rm Int}\,T_{1},\) then there exist \(\lambda_{j}^{0}\geq 0,\,j=1,\ldots,n,\,\sum\nolimits_{j=1}^{n}\lambda_{j}^{0}\geq1\) such that
$$ \sum\limits_{j=1}^{n}\left[ a_{j}+1\times\left( b_{j}-a_{j}\right) \right] \lambda_{j}^{0}=\sum\limits_{j=1}^{n}b_{j}\lambda_{j}^{0}<\hat{x}, $$(1)and \(\hat{\theta}( 1) <1.\) Let
$$ \hat{\beta}>\max\left\{ \underset{1\leq i\leq m;1\leq j\leq n}{\max}\left\{ \left( \hat{x}_{ij}-a_{ij}\right) /\left( b_{ij}-a_{ij}\right) \right\} ,\,L\right\}. $$Then,
$$ a_{j}+\hat{\beta}\left( b_{j}-a_{j}\right) >\hat{x},\quad j=1,\ldots,n. $$Therefore, for any \(\lambda_{j}\geq0,\,j=1,\ldots,n,\,\sum\nolimits_{j=1}^{n} \lambda_{j}\geq1,\) we have
$$ \sum\limits_{j=1}^{n}x_{j}^{\hat{\beta}}\lambda_{j}>\hat{x}. $$(2)From (2), \(\hat{x}\notin T_{\hat{\beta}},\) and from Lemma 2, \(\hat{\theta}( \hat{\beta}) >1.\) As a result, since \(\hat{\theta }( 1) <1,\,\hat{\theta}( \hat{\beta}) >1,\,\hat{\beta}\in(L,\,+\infty), \) from Theorem 2(i), \(\hat{\theta}( \beta) \) is a continuous function defined over (L, +∞). It follows that there exists \(\beta^{\ast}\in(L,\,+\infty) \) such that \(\hat{\theta}( \beta^{\ast}) =1.\)
-
(iii)
If \(\hat{x}\notin T_{1},\) from Lemma 2, \(\hat{\theta}( 1) >1.\) In addition, since
$$ \hat{x}\in\text{Int}\,\left\{x |\sum\limits_{j=1}^{n}x_{j}^{L}\lambda_{j}\leq x,\,\sum\limits_{j=1}^{n}\lambda_{j}\geq1,\,\lambda_{j}\geq0,\,j=1,\ldots,n\right\}, $$there exist \(\lambda_{j}^{0}\geq0,\,j=1,\ldots,n,\,\sum\nolimits_{j=1}^{n} \lambda_{j}^{0}\geq1\) such that
$$ \sum\limits_{j=1}^{n}\left[a_{j}+L\times\left( b_{j}-a_{j}\right) \right]\lambda_{j}^{0}=\sum\limits_{j=1}^{n}x_{j}^{L}\lambda_{j}^{0}<\hat{x}. $$Therefore, there exists \(\hat{\beta}\) that satisfies \(\hat{\beta}>L\) such that
$$ \sum\limits_{j=1}^{n}x_{j}^{\hat{\beta}}\lambda_{j}^{0}<\hat{x}. $$That is, \(\hat{x}\in {\rm Int}\,T_{\hat{\beta}},\) and thus \(\hat{\theta}( \hat{\beta}) <1.\) Consequently, since \(\hat{\theta}( 1) >1,\,\hat{\theta}( \hat{\beta}) <1,\,\hat{\beta}\in(L,\,+\infty), \) from Theorem 2(i), \(\hat{\theta}( \beta) \) is a continuous function defined over (L, +∞). It follows that there exists β* ∈ (L, +∞) such that \(\hat{\theta }( \beta^{\ast}) =1.\) □
Appendix 4: Uniqueness of β*
The following Theorem 4 shows the uniqueness of β*.
Theorem 4
Let \(b_{j}>a_{j},\,j=1,\ldots,n,\, L<\bar{\beta} <\hat{\beta},\) and \(\hat{x}\in\hat{T}.\) Then
-
(i)
There is no intersection between the frontiers of \(T_{\hat{\beta }}\) and \(T_{\bar{\beta}}.\)
-
(ii)
The quantile of DMU-\(\hat{x},\) i.e., β*, is uniquely determined.
Proof
The proof to (i) is achieved by contradiction. If there exists \(x^{0}\in\Re_{+}^{m},\) and x 0 is located on the frontiers of both \(T_{\hat{\beta}}\) and \(T_{\bar{\beta}},\) then, from Theorem 2, \(1=\hat{\theta }( \bar{\beta}) <\hat{\theta}( \hat{\beta}) =1,\) which is a contradiction. That is, there is no intersection between the frontiers of \(T_{\hat{\beta}}\) and \(T_{\bar{\beta}}.\)
The proof to (ii) is also achieved by contradiction. Assume that there exist two quantiles of DMU-\(\hat{x},\) i.e., \(\beta_{1}^{\ast}\) and \(\beta_{2}^{\ast}.\) Without loss of generality, assume that \(L<\beta_{1} ^{\ast}<\beta_{2}^{\ast}.\) Since both \(\beta_{1}^{\ast}\) and \(\beta_{2}^{\ast }\) are the quantiles of DMU-\(\hat{x},\,\hat{\theta}( \beta_{1}^{\ast }) =\hat{\theta}( \beta_{2}^{\ast}) =1.\) However, from Theorem 2, \(\hat{\theta}( \beta_{1}^{\ast}) <\hat{\theta}( \beta_{2}^{\ast}). \) That is, there is a contradiction. It follows that β* is uniquely determined. □
Rights and permissions
About this article
Cite this article
Wei, Q., Chang, TS. & Han, S. Quantile–DEA classifiers with interval data. Ann Oper Res 217, 535–563 (2014). https://doi.org/10.1007/s10479-014-1565-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-014-1565-y