Skip to main content

Advertisement

Log in

A robust novelty detection framework based on ensemble learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Novelty detection techniques have been used extensively to discover interesting data patterns in practical applications. However, real-world training data are often contaminated by unknown anomalous examples (outliers), leading to deteriorated detectors. In order to alleviate this problem, this paper proposes a robust novelty detection framework based on ensemble learning. In contrast to traditional parallel outlier ensembles based on variance-reduction, both bias and variance are considered in our ensemble framework. Specifically, we aim to reduce the bias induced by unknown outliers with an iterative mechanism. A weighting scheme is used to combine the result of current iteration with the previous iteration. By gradually removing outliers in the training set, performance of the detector can be improved. In addition, base detectors at all iterations will be aggregated by the weighting scheme in order to realize the variance reduction. Moreover, a flexible function that provides reference ground-truth is proposed so that our detection framework can be effective on different types of data sets. We conduct experiments on 15 benchmark data sets to verify the superiority over parallel ensembles and single models. A case study concerning wind tunnel is also carried out on 10 data sets from a real-world wind tunnel system. Experimental results have shown the superiority of our detector over several competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. In this paper, we make a subtle distinction between outlier detection and novelty detection. In particular, outlier detection indicates mining abnormal patterns existing in the given database, while novelty detection implies the identification of unseen abnormal patterns out of the given database.

  2. Here the term “data description” and “one-class classification” can be used as substitute for each other.

  3. Here we underline the term “unknown outliers” because labeled abnormal data (known outliers) would be beneficial for the constructions of normal patterns oppositely.

  4. In this paper, positive class indicates target class, and negative class means outlier class.

References

  1. Aggarwal CC, Sathe S (2015) Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor Newsl 17(1):24–47

    Article  Google Scholar 

  2. Amer M, Goldstein M, Abdennadher S (2013) Enhancing one-class support vector machines for unsupervised anomaly detection. Paper presented at the Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description

  3. Bicego M, Figueiredo MA (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42(1):27–32

    Article  Google Scholar 

  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  5. Campos GO, Zimek A, Sander JR, Campello RJGB, Micenková B, Schubert E, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Dis 30(4):891–927

    Article  MathSciNet  Google Scholar 

  6. Cha M, Kim JS, Baek JG (2014) Density weighted support vector data description. Expert Syst Appl 41(7):3343–3350

    Article  Google Scholar 

  7. Chen G, Zhang X, Wang ZJ, Li F (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90:129–137

    Article  Google Scholar 

  8. Cyganek B (2012) One-class support vector ensembles for image segmentation and classification. J Math Imaging Vis 42(2–3):103–117

    Article  MathSciNet  Google Scholar 

  9. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157. https://doi.org/10.1023/A:1007607513941

    Article  Google Scholar 

  10. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Paper presented at the Proceedings of the International Conference on Machine Learning

  11. Gao J, Tan P-N (2006) Converting output scores from outlier detection algorithms into probability estimates. Paper presented at the Sixth International Conference on Data Mining (ICDM'06)

  12. Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A: Syst Hum 41(3):552–568

    Article  Google Scholar 

  13. Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl-Based Syst 107:43–53

    Article  Google Scholar 

  14. Krawczyk B, Woźniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195

    Article  MathSciNet  Google Scholar 

  15. Kriegel H-P, Kroger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. Paper presented at the Proceedings of the 2011 SIAM International Conference on Data Mining

  16. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

  17. Lee H-j, Roberts SJ (2008) On-line novelty detection using the Kalman filter and extreme value theory. Paper presented at the 19th International Conference on Pattern Recognition

  18. Lee K, Kim D-W, Lee KH, Lee D (2007) Density-induced support vector data description. IEEE Trans Neural Netw 18(1):284–289

    Article  Google Scholar 

  19. Liu B, Xiao Y, Cao L, Hao Z, Deng F (2012) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34(3):597–618

    Article  Google Scholar 

  20. Liu B, Xiao Y, Philip SY, Hao Z, Cao L (2014) An efficient approach for outlier detection with imperfect data labels. IEEE Trans Knowl Data Eng 26(7):1602–1616

    Article  Google Scholar 

  21. Merkes S, Defreitas A, Smith E, Alexander WN, Devenport WJ, Leman S (2019) Robust anomaly detection for large scale multi-type sensor systems. Paper presented at the AIAA Scitech 2019 Forum

  22. Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. Paper presented at the International Conference on Database Systems for Advanced Applications

  23. Pearson RK (2001) Exploring process data. J Process Control 11(2):179–194

    Article  Google Scholar 

  24. Platanios EA, Blum A, Mitchell T (2014) Estimating accuracy from unlabeled data. Paper presented at the The Thirtieth Conference on Uncertainty in Artificial Intelligence

  25. Prayoonpitak T, Wongsa S (2017) A robust one-class support vector machine using Gaussian-based penalty factor and its application to fault detection. Int J Mater Mech Manuf 5(3):146–152

    Google Scholar 

  26. Rayana S, Zhong W, Akoglu L (2016) Sequential ensemble learning for outlier detection: A bias-variance perspective. Paper presented at the 2016 IEEE 16th International Conference on Data Mining (ICDM)

  27. Roberts SJ (1999) Novelty detection using extreme value statistics. IEE Proc Vis Image Signal Process 146(3):124–129

    Article  Google Scholar 

  28. Schubert E, Wojdanowski R, Zimek A, Kriegel H-P (2012) On evaluation of outlier rankings and outlier scores. Paper presented at the SIAM International Conference on Data Mining

  29. Swersky L, Marques HO, Sander J, Campello RJGB, Zimek A (2016) On the evaluation of outlier detection and one-class classification methods. Paper presented at the IEEE International Conference on Data Science and Advanced Analytics

  30. Tax DMJ (2002) One-class classification: concept learning in the absence of counter-examples. Technische Universiteit

    Google Scholar 

  31. Tian Y, Mirzabagheri M, Bamakan SMH, Wang H, Qu Q (2018) Ramp loss one-class support vector machine; a robust and effective approach to anomaly detection problems. Neurocomputing 310:223–235

    Article  Google Scholar 

  32. Wang B, Mao Z (2018) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476

    Article  Google Scholar 

  33. Wang B, Mao Z (2019) Integrating mach number prediction with outlier detection for wind tunnel systems. J Aerosp Eng 32(5):04019059

    Article  Google Scholar 

  34. Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inform Fusion 51:244–258

    Article  Google Scholar 

  35. Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbour rule. Inform Fusion 63:30–40. https://doi.org/10.1016/j.inffus.2020.05.001

    Article  Google Scholar 

  36. Wang B, Mao Z, Huang K (2017) Detecting outliers in complex nonlinear systems controlled by predictive control strategy. Chaos, Solitons Fract 103:588–595

    Article  Google Scholar 

  37. Wang B, Mao Z, Huang K (2018) A prediction and outlier detection scheme of molten steel temperature in ladle furnace. Chem Eng Res Des 138:229–247

    Article  Google Scholar 

  38. Wang X, Yuan P, Mao Z, You M (2016) Molten steel temperature prediction model based on bootstrap feature subsets ensemble regression trees. Knowl-Based Syst 101:48–59

    Article  Google Scholar 

  39. Xiao Y, Wang H, Xu W (2017) Ramp loss based robust one-class SVM. Pattern Recogn Lett 85:15–20

    Article  Google Scholar 

  40. Xiao Y, Wang H, Xu W, Zhou J (2016) Robust one-class SVM for fault detection. Chemom Intell Lab Syst 151:15–25

    Article  Google Scholar 

  41. Xing H-J, Ji M (2018) Robust one-class support vector machine with rescaled hinge loss function. Pattern Recogn 84:152–164

    Article  Google Scholar 

  42. Xing H-J, Liu W-T (2020) Robust AdaBoost based ensemble of one-class support vector machines. Inform Fusion 55:45–58

    Article  Google Scholar 

  43. Yin S, Zhu X, Jing C (2014) Fault detection based on a robust one class support vector machine. Neurocomputing 145:263–268

    Article  Google Scholar 

  44. Yuan P, Mao Z, Wang B (2020) A pruned support vector data description-based outlier detection method: applied to robust process monitoring. Trans Inst Meas Control 42:2113–2126 (In Press)

    Article  Google Scholar 

  45. Yuan P, Wang B, Mao Z (2020) Using multiple classifier behavior to develop a dynamic outlier ensemble. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-020-01183-7

    Article  Google Scholar 

  46. Zhang J, Yuan P, Chin K-S (2017) Model predictive control for the flow field in an intermittent transonic wind tunnel. IEEE Trans Aerosp Electron Syst 54(1):324–338

    Article  Google Scholar 

  47. Zimek A, Campello RJ, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explor Newsl 15(1):11–22

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biao Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Wang, W., Wang, N. et al. A robust novelty detection framework based on ensemble learning. Int. J. Mach. Learn. & Cyber. 13, 2891–2908 (2022). https://doi.org/10.1007/s13042-022-01569-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01569-9

Keywords

Navigation