Abstract
Effective methods for determining the boundary of the normal class are very useful for detecting anomalies in commercial or security applications—a problem known as anomaly detection. This contribution proposes a nonparametric frontier-based classification (NPFC) method for anomaly detection. By relaxing the commonly used convexity assumption in the literature, a nonconvex-NPFC method is constructed and the nonconvex nonparametric frontier turns out to provide a more conservative boundary enveloping the normal class. By reflecting on the monotonic relation between the characteristic variables and the membership, the proposed NPFC method is in a more general form since both input-like and output-like characteristic variables are incorporated. In addition, by allowing some of the training observations to be misclassified, the convex- and nonconvex-NPFC methods are extended from a hard nonparametric frontier to a soft one, which also provides a more conservative boundary enclosing the normal class. Both simulation studies and a real-life data set are used to evaluate and compare the proposed NPFC methods to some well-established methods in the literature. The results show that the proposed NPFC methods have competitive classification performance and have consistent advantages in detecting abnormal samples, especially the nonconvex-NPFC methods.





Similar content being viewed by others
Notes
The data set is collected from the Statlib data archive at: http://lib.stat.cmu.edu/datasets/
References
Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Futur Gener Comput Syst 55:278–288
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Berlin, pp 39–50
Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402
Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754
Andersen P, Petersen NC (1993) A procedure for ranking efficient units in data envelopment analysis. Manage Sci 39(10):1261–1264
Aparicio J, Esteve M, Rodriguez-Sala JJ, Zofio JL (2021) The estimation of productive efficiency through machine learning techniques: efficiency analysis trees. In: Zhu J, Charles V (eds) Data-enabled analytics: DEA for big data. Springer, Cham, pp 51–92
Banker RD, Charnes A, Cooper WW (1984) Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manage Sci 30(9):1078–1092
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613
Briec W (1997) A graph-type extension of Farrell technical efficiency measure. J Prod Anal 8(1):95–110
Briec W, Kerstens K, Van de Woestyne I (2016) Congestion in production correspondences. J Econ 119(1):65–90
Briec W, Kerstens K, Van de Woestyne I (2018) Hypercongestion in production correspondences: an empirical exploration. Appl Econ 50(27):2938–2956
Chambers R, Chung Y, Färe R (1998) Profit, directional distance functions, and nerlovian efficiency. J Optim Theory Appl 98(2):351–364
Chang D, Kuo Y (2008) An Approach for the two-group discriminant analysis: an application of DEA. Math Comput Model 47(9–10):970–981
Cherchye L, Kuosmanen T, Post T (2001) FDH directional distance functions with an application to European commercial banks. J Prod Anal 15(3):201–215
Clifton L, Clifton DA, Zhang Y, Watkinson P, Tarassenko L, Yin H (2014) Probabilistic novelty detection with support vector machines. IEEE Trans Reliab 63(2):455–467
Cox LH, Johnson MM, Kafadar K (1982) Exposition of statistical graphics technology. In: Proceedings of the statistical computation section, American Statistical Association, Washington, D.C, pp 55–56
da Silva DB, Schmidt D, da Costa CA, da Rosa Righi R, Eskofier B (2021) Deepsigns: a predictive model based on deep learning for the early detection of patient health deterioration. Expert Syst Appl 165:113905
Deprins D, Simar L, Tulkens H (1984) Measuring labor efficiency in post offices. In: Marchand M, Pestieau P, Tulkens H (eds) The performance of public enterprises: concepts and measurements. North Holland, Amsterdam, pp 243–268
El-Yaniv R, Nisenson M (2006) Optimal single-class classification strategies. Adv Neural Inform Process Syst 19
Emrouznejad A, Banker RD, Neralic L (2019) Advances in data envelopment analysis: celebrating the 40th anniversary of DEA and the 100th anniversary of professor Abraham Charnes, Birthday. Eur J Op Res 278(2):365–367
Emrouznejad A, Yang G-L (2018) A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socioecon Plann Sci 61:4–8
Esteve M, Aparicio J, Rodriguez-Sala JJ, Zhu J (2023) Random forests and the measurement of super-efficiency in the context of free disposal hull. Eur J Oper Res 304(2):729–744
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Jiang T, Li Y, Xie W, Du Q (2020) Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 58(7):4666–4679
Kaffash S, Azizi R, Huang Y, Zhu J (2020) A survey of data envelopment analysis applications in the insurance industry 1993–2018. Eur J Oper Res 284(3):801–813
Kerstens K, Sadeghi J, Toloo M, Van de Woestyne I (2022) Procedures for ranking technical and cost efficient units: with a focus on nonconvexity. Eur J Oper Res 300(1):269–281
Kerstens K, Van de Woestyne I (2011) Negative data in DEA: a simple proportional distance function approach. J Op Res Soc 62(7):1413–1419
Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374
Kim J-H (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745
Leon CF, Palacios F (2009) Evaluation of rejected cases in an acceptance system with data envelopment analysis and goal programming. J Op Res Soc 60(10):1411–1420
Lovell K, Pastor JT (1999) Radial DEA models without inputs or without outputs. Eur J Oper Res 118(1):46–51
Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36(3):2009–2031
Markou M, Singh S (2003) Novelty detection: A review-part 1: statistical approaches. Signal Process 83(12):2481–2497
Markou M, Singh S (2003) Novelty detection: a review-part 2: neural network based approaches. Signal Process 83(12):2499–2521
Moya MM, Koch MW, Hostetler LD (1993) One-class classifier networks for target recognition applications. In: World congress on neural networks, vol 3. Lawrence Erlbaum Associates, Portland, pp 797–801
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):1–38
Park C, Huang JZ, Ding Y (2010) A computable plug-in estimator of minimum volume sets for novelty detection. Oper Res 58(5):1469–1480
Pendharkar P, Khosrowpour M, Rodger J (2000) Application of Bayesian network classifiers and data envelopment analysis for mining breast cancer patterns. J. Comput. Inform. Syst. 40(4):127–132
Pendharkar P, Rodger J, Yaverbaum G (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst Appl 17(3):223–232
Pendharkar PC (2002) A potential use of data envelopment analysis for the inverse classification problem. Omega 30(3):243–248
Qiu K (2022) Support Vector Data Description (SVDD Version 2.2), https://github.com/iqiukp/SVDD–MATLAB
Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, Müller K-R (2021) A unifying review of deep and shallow anomaly detection. Proc IEEE 109(5):756–795
Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. In: Solla A, Müller K-R, Leen TK (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, Massachusetts, pp 582–588
Seiford L, Zhu J (1998) An acceptance system decision rule with data envelopment analysis. Comput Op Res 25(4):329–332
Sueyoshi T (2006) DEA-discriminant analysis: methodological comparison among eight discriminant analysis approaches. Eur J Oper Res 169(1):247–272
Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199
Tax DMJ, Juszczak P (2003) Kernel whitening for one-class classification. Int J Pattern Recognit Artif Intell 17(3):333–347
Troutt M, Rai A, Zhang A (1996) The potential use of DEA for credit applicant acceptance systems. Comput Op Res 23(4):405–408
Valero-Carreras D, Aparicio J, Guerrero NM (2021) Support vector frontiers: a new approach for estimating production functions through support vector machines. Omega 104:102490
Wu J, An Q, Liang L (2011) A modified super-efficiency DEA approach for solving multi-groups classification problems. Int J Comput Intell Syst 4(4):606–618
Yan H, Wei Q (2011) Data envelopment analysis classification machine. Inf Sci 181(22):5029–5041
Zhu N, Zhu C, Emrouznejad A (2021) A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of chinese manufacturing listed companies. J Manage Sci Eng 6(4):435–448
Acknowledgements
We appreciate the constructive comments of the Editor-in-Chief, G. Voigt, and two anonymous referees, which helped to improve this contribution. The usual disclaimer applies. Q. Jin appreciates financial support of the National Natural Science Foundation of China (No. 72201127).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Discussions on the choices of cut-off super-efficiency
Appendix: Discussions on the choices of cut-off super-efficiency
In this contribution, a negative cut-off super-efficiency denoted by \(c_{\Lambda }\) is introduced to decide the noisy and less important training observations. A larger value of \(c_{\Lambda }\) means that more training observations will be identified as noisy and of low importance and thus, will be excluded from the construction of a soft nonparametric frontier.
For both the C and NC cases in Sect. 3.2, the simulation is executed for 100 times. Here, an example for the C case is extracted to show different C and NC soft nonparametric frontiers constructed from different choices of \(c_{\Lambda }\).
The resulted frontiers under different choices of \(c_{\Lambda }\) are displayed in Figs. 5 and 6. In every sub-figure, the normal training observations are represented by blue crosses. The training observations that are identified as noisy and of low importance are further marked by red circles. These training observations are excluded while constructing the corresponding soft nonparametric frontier. The soft nonparametric frontier is represented by the blue solid lines.
Similar observations can be derived from Figs. 5 and 6. With the increase in \(c_{\Lambda }\), more training observations are identified as noisy and of low importance. Accordingly, the soft nonparametric frontier becomes more conservative. In comparison with the C soft nonparametric frontier, the NC soft nonparametric frontier is more conservative, since more training observations are excluded.
A proper choice of \(c_{\Lambda }\) is important for identifying as accurately as possible noisy and less important training observations. In the following, the suggested way of deciding the value of \(c_{\Lambda }\) is explained with the above simulation example.
By solving model (7) under the C case for 700 normal training observations, 5 of them are identified as frontier observations. Their values of \(\delta _{\text {super},\Lambda ^\text {C}}^*\) ordered from largest to smallest are represented by blue diamonds in Fig. 7a. It is observed that 4 out of 5 frontier observations have a \(\delta _{\text {super},\Lambda ^\text {C}}^*\) larger than \(-\)0.1. Only one frontier observations has \(\delta _{\text {super},\Lambda ^\text {C}}^*=-0.1537\). Thus, \(c_{\Lambda ^\text {C}}=-0.1\) is chosen.
Similarly, by solving model (7) under the NC case for the same normal training observations, 11 of them are identified as frontier observations. Their values of \(\delta _{\text {super},\Lambda ^\text {NC}}^*\) ordered from largest to smallest are as shown in Fig. 7b. It is observed that 10 out of 11 frontier observations have the value of \(\delta _{\text {super},\Lambda ^\text {NC}}^*\) larger than \(-\)0.2. Only one frontier observations has \(\delta _{\text {super},\Lambda ^\text {NC}}^*=-0.2494\). Thus, \(c_{\Lambda ^\text {NC}}=-0.2\) is chosen.
One suggested way of deciding the value of \(c_{\Lambda }\) is illustrated. Note that we do not intend to suggest that this is the optimal way. It is worthwhile for future researches to explore the other methods of deciding a proper \(c_{\Lambda }\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, Q., Kerstens, K. & Van de Woestyne, I. Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection. OR Spectrum 46, 1213–1239 (2024). https://doi.org/10.1007/s00291-024-00751-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00291-024-00751-5