Abstract
Support vector data description (SVDD) has been widely used in outlier detection. The conventional SVDD employs the hinge loss function and the sphere classifier is decided by only a small amount of data around the sphere surface (namely support vectors), which makes it sensitive to noise and unstable for re-sampling. In this paper, we put forward a novel support vector data description method with pinball loss (pin-SVDD). In our method, all the training data, including those lying inside the sphere, is decisive to the sphere classifier. A small amount of noisy data has little influence on the classifier, which makes our method more robust to noise and achieve scatter minimization in the sphere center. Pin-SVDD has two main merits. (1) Different from the conventional SVDD which employs the hinge loss function and is sensitive to noise, pin-SVDD applies the pinball loss which makes our method more robust to noise and achieve scatter minimization in the sphere center. (2) Distinguished from the existing anti-noise SVDD methods which are based on weight varying and need an extra preprocessing time to generate the instance weights, pin-SVDD does not need preprocessing time and has the same time complexity with the conventional SVDD. Hence, pin-SVDD shows better robustness than the conventional SVDD, but has the same time complexity. The experiment result shows that pin-SVDD has better outlier detection performance than state-of-the-art SVDD-based outlier detection methods, and needs less time on training.
Similar content being viewed by others
Notes
Since the conventional SVDD employs the hinge loss function, the conventional SVDD is also referred to as hinge loss SVDD in this paper.
The UCI datasets used in our experiments are available online from http://homepage.tudelft.nl/n9d04/occ/index.html
References
David MJ, TaxRobert P, Duin W Support vector data description, Machine Learning
Bernhard S (2003) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Navia-Vazquez A, Gutierrez-Gonzalez D, Parrado-Hernandez E, Navarro-Abellan JJ (2006) Distributed support vector machines. IEEE Trans Neural Netw 17(4):1091–1097
Chandola V, Banerjee A, Kumar V Anomaly detection: A survey, ACM Computing Surveys 41 (3)
Li H, Yuan Y, Fan Z-P, Liu Y (2014) A fta-based method for risk decision-making in emergency response, 42, 49–57
Giacinto G, Perdisci R, Rio MD, Roli F Intrusion detection in computer networks by a modular ensemble of one-class classifiers, Information Fusion
Lee K, Kim DW, Lee KH, Lee D (2007) Density-induced support vector data description. IEEE Trans Neural Netw 18:284– 289
Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: DBLP
Bi J, Zhang T (2004) Support vector classification with input data uncertainty. 17
Cha M, Kim JS, Baek JG (2014) Density weighted support vector data description. Expert Syst Appl 41(7):3343–3350
Chen G, Zhang X, Wang ZJ, Li F (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90(DEC):129–137
Liu B, Xiao Y, Cao L, Hao Z, Deng F (2013) Svdd-based outlier detection on uncertain data. Knowledge Inf Syst 34(3):597– 618
Wang CD, Lai JH (2013) Position regularized support vector domain description. Pattern Recogn 46(3):875–884
Liu B, Xiao Y, Yu PS, Hao Z, Cao L (2014) An efficient approach for outlier detection with imperfect data labels. Knowledge & Data Engineering IEEE Transactions on 26(7):1602–1616
Ergen T, Kozat SS (2020) Unsupervised anomaly detection with lstm neural networks. IEEE Trans Neural Netw Learn Syst 31(8):3127–3141. https://doi.org/10.1109/TNNLS.2019.2935975
Görnitz N, Lima LA, Müller K-R, Kloft M, Nakajima S (2018) Support vector data descriptions and k -means clustering: One class? IEEE Trans Neural Netw Learn Syst 29(9):3994–4006. https://doi.org/10.1109/TNNLS.2017.2737941
Du W, Tian Y, Qian F (2014) Monitoring for nonlinear multiple modes process based on ll-svdd-mrda. IEEE Trans Autom Sci Eng 11(4):1133–1148. https://doi.org/10.1109/TASE.2013.2285571
Turkoz M, Kim S, Son Y, Jeong MK, Elsayed EA (2019) Generalized support vector data description for anomaly detection. Pattern Recog 100:107119
Koenker R, Hall R, Jones C, Roberts J, Samuelson L, Lothgren M, Tambour M, Diewert E, Theil H, Thirlwall AP (2004) Quantile regression for longitudinal data, Academic Press Inc.
Christmann A, Steinwart I (2008) How svms can estimate quantiles and the median. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems, vol 20. Curran Associates, Inc
Steinwart I, Christmann A (2011) Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17(1):211–225. https://doi.org/10.3150/10-BEJ267
Huang X, Shi L, Suykens JAK (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997
Xu Y, Yang Z, Pan X (2017) A novel twin support-vector machine with pinball loss. IEEE Trans Neural Netw Learn Syst 28(2):359–370
Xu Y, Yang Z, Zhang Y, Pan X, Wang L (2016) A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowledge Based Systems 95(Mar.1):75–85
Xu Y, Wang Q, Pang X, Tian Y (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):1–12
Gong R, Wu C, Chu M, Wang H (2016) Twin pinball loss support vector hyper-sphere classifier for pattern recognition. In: Control & decision conference
Wang K, Lan H (2020) Robust support vector data description for novelty detection with contaminated data. Eng Appl Artif Intell 91:103554
Liu B, Yin J, Xiao Y, Cao L, Yu PS (2011) Exploiting local data uncertainty to boost global outlier detection. In: IEEE International conference on data mining
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Eleventh acm sigkdd international conference on knowledge discovery in data mining
Wu M, Ye J (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning the. MIT Press
Michael S (2008) Handbook of parametric and nonparametric statistical procedures (4th ed.)., Am Stat
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University
Acknowledgements
The authors would like to thank the reviewers for their very useful comments and suggestions. This work was supported in part by the Natural Science Foundation of China under Grant 61876044 and Grant 62076074, in part by Guangdong Natural Science Foundation under Grant 2020A1515010670 and 2020A1515011501 in part by the Science and Technology Planning Project of Guangzhou under Grant 202002030141.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhong, G., Xiao, Y., Liu, B. et al. Pinball loss support vector data description for outlier detection. Appl Intell 52, 16940–16961 (2022). https://doi.org/10.1007/s10489-022-03237-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03237-5