Abstract
Novelty detection techniques have been used extensively to discover interesting data patterns in practical applications. However, real-world training data are often contaminated by unknown anomalous examples (outliers), leading to deteriorated detectors. In order to alleviate this problem, this paper proposes a robust novelty detection framework based on ensemble learning. In contrast to traditional parallel outlier ensembles based on variance-reduction, both bias and variance are considered in our ensemble framework. Specifically, we aim to reduce the bias induced by unknown outliers with an iterative mechanism. A weighting scheme is used to combine the result of current iteration with the previous iteration. By gradually removing outliers in the training set, performance of the detector can be improved. In addition, base detectors at all iterations will be aggregated by the weighting scheme in order to realize the variance reduction. Moreover, a flexible function that provides reference ground-truth is proposed so that our detection framework can be effective on different types of data sets. We conduct experiments on 15 benchmark data sets to verify the superiority over parallel ensembles and single models. A case study concerning wind tunnel is also carried out on 10 data sets from a real-world wind tunnel system. Experimental results have shown the superiority of our detector over several competitors.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, we make a subtle distinction between outlier detection and novelty detection. In particular, outlier detection indicates mining abnormal patterns existing in the given database, while novelty detection implies the identification of unseen abnormal patterns out of the given database.
Here the term “data description” and “one-class classification” can be used as substitute for each other.
Here we underline the term “unknown outliers” because labeled abnormal data (known outliers) would be beneficial for the constructions of normal patterns oppositely.
In this paper, positive class indicates target class, and negative class means outlier class.
References
Aggarwal CC, Sathe S (2015) Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor Newsl 17(1):24–47
Amer M, Goldstein M, Abdennadher S (2013) Enhancing one-class support vector machines for unsupervised anomaly detection. Paper presented at the Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Bicego M, Figueiredo MA (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42(1):27–32
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Campos GO, Zimek A, Sander JR, Campello RJGB, Micenková B, Schubert E, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Dis 30(4):891–927
Cha M, Kim JS, Baek JG (2014) Density weighted support vector data description. Expert Syst Appl 41(7):3343–3350
Chen G, Zhang X, Wang ZJ, Li F (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90:129–137
Cyganek B (2012) One-class support vector ensembles for image segmentation and classification. J Math Imaging Vis 42(2–3):103–117
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157. https://doi.org/10.1023/A:1007607513941
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Paper presented at the Proceedings of the International Conference on Machine Learning
Gao J, Tan P-N (2006) Converting output scores from outlier detection algorithms into probability estimates. Paper presented at the Sixth International Conference on Data Mining (ICDM'06)
Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A: Syst Hum 41(3):552–568
Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl-Based Syst 107:43–53
Krawczyk B, Woźniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195
Kriegel H-P, Kroger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. Paper presented at the Proceedings of the 2011 SIAM International Conference on Data Mining
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Lee H-j, Roberts SJ (2008) On-line novelty detection using the Kalman filter and extreme value theory. Paper presented at the 19th International Conference on Pattern Recognition
Lee K, Kim D-W, Lee KH, Lee D (2007) Density-induced support vector data description. IEEE Trans Neural Netw 18(1):284–289
Liu B, Xiao Y, Cao L, Hao Z, Deng F (2012) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34(3):597–618
Liu B, Xiao Y, Philip SY, Hao Z, Cao L (2014) An efficient approach for outlier detection with imperfect data labels. IEEE Trans Knowl Data Eng 26(7):1602–1616
Merkes S, Defreitas A, Smith E, Alexander WN, Devenport WJ, Leman S (2019) Robust anomaly detection for large scale multi-type sensor systems. Paper presented at the AIAA Scitech 2019 Forum
Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. Paper presented at the International Conference on Database Systems for Advanced Applications
Pearson RK (2001) Exploring process data. J Process Control 11(2):179–194
Platanios EA, Blum A, Mitchell T (2014) Estimating accuracy from unlabeled data. Paper presented at the The Thirtieth Conference on Uncertainty in Artificial Intelligence
Prayoonpitak T, Wongsa S (2017) A robust one-class support vector machine using Gaussian-based penalty factor and its application to fault detection. Int J Mater Mech Manuf 5(3):146–152
Rayana S, Zhong W, Akoglu L (2016) Sequential ensemble learning for outlier detection: A bias-variance perspective. Paper presented at the 2016 IEEE 16th International Conference on Data Mining (ICDM)
Roberts SJ (1999) Novelty detection using extreme value statistics. IEE Proc Vis Image Signal Process 146(3):124–129
Schubert E, Wojdanowski R, Zimek A, Kriegel H-P (2012) On evaluation of outlier rankings and outlier scores. Paper presented at the SIAM International Conference on Data Mining
Swersky L, Marques HO, Sander J, Campello RJGB, Zimek A (2016) On the evaluation of outlier detection and one-class classification methods. Paper presented at the IEEE International Conference on Data Science and Advanced Analytics
Tax DMJ (2002) One-class classification: concept learning in the absence of counter-examples. Technische Universiteit
Tian Y, Mirzabagheri M, Bamakan SMH, Wang H, Qu Q (2018) Ramp loss one-class support vector machine; a robust and effective approach to anomaly detection problems. Neurocomputing 310:223–235
Wang B, Mao Z (2018) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476
Wang B, Mao Z (2019) Integrating mach number prediction with outlier detection for wind tunnel systems. J Aerosp Eng 32(5):04019059
Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inform Fusion 51:244–258
Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbour rule. Inform Fusion 63:30–40. https://doi.org/10.1016/j.inffus.2020.05.001
Wang B, Mao Z, Huang K (2017) Detecting outliers in complex nonlinear systems controlled by predictive control strategy. Chaos, Solitons Fract 103:588–595
Wang B, Mao Z, Huang K (2018) A prediction and outlier detection scheme of molten steel temperature in ladle furnace. Chem Eng Res Des 138:229–247
Wang X, Yuan P, Mao Z, You M (2016) Molten steel temperature prediction model based on bootstrap feature subsets ensemble regression trees. Knowl-Based Syst 101:48–59
Xiao Y, Wang H, Xu W (2017) Ramp loss based robust one-class SVM. Pattern Recogn Lett 85:15–20
Xiao Y, Wang H, Xu W, Zhou J (2016) Robust one-class SVM for fault detection. Chemom Intell Lab Syst 151:15–25
Xing H-J, Ji M (2018) Robust one-class support vector machine with rescaled hinge loss function. Pattern Recogn 84:152–164
Xing H-J, Liu W-T (2020) Robust AdaBoost based ensemble of one-class support vector machines. Inform Fusion 55:45–58
Yin S, Zhu X, Jing C (2014) Fault detection based on a robust one class support vector machine. Neurocomputing 145:263–268
Yuan P, Mao Z, Wang B (2020) A pruned support vector data description-based outlier detection method: applied to robust process monitoring. Trans Inst Meas Control 42:2113–2126 (In Press)
Yuan P, Wang B, Mao Z (2020) Using multiple classifier behavior to develop a dynamic outlier ensemble. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-020-01183-7
Zhang J, Yuan P, Chin K-S (2017) Model predictive control for the flow field in an intermittent transonic wind tunnel. IEEE Trans Aerosp Electron Syst 54(1):324–338
Zimek A, Campello RJ, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explor Newsl 15(1):11–22
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, B., Wang, W., Wang, N. et al. A robust novelty detection framework based on ensemble learning. Int. J. Mach. Learn. & Cyber. 13, 2891–2908 (2022). https://doi.org/10.1007/s13042-022-01569-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01569-9