Abstract
Semi-supervised anomaly detection has received wide interest because of not requiring counterexamples during training. Existing competence measures for semi-supervised dynamic ensemble anomaly detection models do not consider the imbalance characteristic of training samples, which will result in serious overfitting on normal samples. This paper proposes two outlier-sensitive measures to estimate the competence of base classifiers for dynamic ensemble models. When a normal sample is correctly classified, both measures give a higher positive score to base classifiers with confidence closer to 0.5, which is different from the conventional idea that base classifiers with higher confidence should obtain higher scores. When a sample is misclassified, the Output-based Outlier-Sensitive measure calculates a negative score based on the confidence outputted by the base classifier, while the Cost-Sensitive-based Outlier-Sensitive measure gives a negative score based on the category of this sample. Multiple experiments are carried out on 30 datasets from public repositories under the unified framework proposed in this paper, and results show that dynamic ensemble models with our competence measures can outperform a number of typical ensemble models in terms of G-mean and F1, regardless of the pseudo outlier labeling methods and base classifier selection methods used in the model.
Similar content being viewed by others
Data Availability
The datasets supporting the results of this article are all from KEEL, ELKI and ODDS public databases.
Code Availability
Custom code.
Notes
For other researchers can better reproduce our experimental result: During implementation, we find that one of the base one-class classifiers KNN_DD (see Sect. 4.2) from ’dd_tools’ cannot return a reasonable value on training samples. In such case, we use a conversion method offered by ’dd_tools’ itself, which can directly normalize outputs for validation samples and test instances, (only) on KNN_DD.
References
Aggarwal CC, Sathe S (2017) Outlier ensembles: an introduction. Springer. https://doi.org/10.1007/978-3-319-54765-7
Anbarasi MS, Ghaayathri S, Kamaleswari R, Abirami I (2011) Outlier detection for multidimensional medical data. Int J Comput Sci Inf Technol 2(1):512–516
Antosik B, Kurzynski M (2011) New measures of classifier competence-heuristics and application to the design of multiple classifier systems. In: Computer recognition systems 4. Springer, Berlin, pp 197–206. https://doi.org/10.1007/978-3-642-20320-6_21
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58. https://doi.org/10.1145/1541880.1541882
Chen Y, Zhou XS, Huang TS (2001) One-class SVM for learning in image retrieval. In: Proceedings 2001 international conference on image processing, IEEE, pp 34–37. https://doi.org/10.1109/ICIP.2001.958946
Cohen G, Sax H, Geissbuhler A (2008) Novelty detection using one-class Parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inf 136:21–26
Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216. https://doi.org/10.1016/j.inffus.2017.09.010
Dewan I, Rao BLSP (2005) Wilcoxon-signed rank test for associated sequences. Stat Prob Lett 71(2):131–142. https://doi.org/10.1016/j.spl.2004.10.034
Désir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46(12):3490–3506. https://doi.org/10.1016/j.patcog.2013.05.022
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on Artificial intelligence, Lawrence Erlbaum Associates Ltd, pp 973–978
Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Sixth international conference on data mining, IEEE, pp 212–221. https://doi.org/10.1109/ICDM.2006.43
García S, Zhang ZL, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445–446:22–37. https://doi.org/10.1016/j.ins.2018.03.002
Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75. https://doi.org/10.1109/34.273716
Huang D, Mu D, Yang L, Cai X (2018) CoDetect: Financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174. https://doi.org/10.1109/ACCESS.2018.2816564
Khraisat A, Gondal I, Vamplew P, Kamruzzaman J, Alazab A (2020) Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine. Electronics. https://doi.org/10.3390/electronics9010173
Ko AHR, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731. https://doi.org/10.1016/j.patcog.2007.10.015
Krawczyk B (2015) One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing 150:490–500. https://doi.org/10.1016/j.neucom.2014.07.068
Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl Based Syst 107:43–53. https://doi.org/10.1016/j.knosys.2016.05.054
Krawczyk B, Woźniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195. https://doi.org/10.1016/j.ins.2013.12.019
Lai C, Tax DMJ, Duin RPW, Pękalska E, Paclík P (2002) On combining one-class classifiers for image database retrieval. In: Multiple classifier systems. Springer, Berlin, pp 212–221. https://doi.org/10.1007/3-540-45428-4_21
Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481. https://doi.org/10.1016/j.neucom.2006.05.013
Parhizkar E, Abadi M (2015) BeeOWA: A novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381. https://doi.org/10.1016/j.neucom.2015.03.051
Partalas I, Tsoumakas G, Hatzikos EV, Vlahavas I (2008) Greedy regression ensemble selection: theory and an application to water quality prediction. Inf Sci 178(20):3867–3879. https://doi.org/10.1016/j.ins.2008.05.025
Rayana S, Akoglu L (2016) Less is more: building selective anomaly ensembles. ACM Trans Knowl Discov Data 10(4):1–33. https://doi.org/10.1145/2890508
Singh G, Masseglia F, Fiot C, Marascu A, Poncelet P (2010) Mining common outliers for intrusion detection. In: Advances in knowledge discovery and management. Springer, Berlin, pp 217–234. https://doi.org/10.1007/978-3-642-00580-0_13
Tax DMJ, Duin RPW (2001) Combining one-class classifiers. In: Multiple classifier systems. Springer, Berlin, pp 299–308. https://doi.org/10.1007/3-540-48219-9_30
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
Wang B, Mao Z (2018) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476. https://doi.org/10.1177/0142331217724508
Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inf Fusion 51:244–258. https://doi.org/10.1016/j.inffus.2019.02.006
Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule. Inf Fusion 63:30–40. https://doi.org/10.1016/j.inffus.2020.05.001
Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000. https://doi.org/10.1109/ACCESS.2019.2932769
Yuan P, Wang B, Mao Z (2021) Using multiple classifier behavior to develop a dynamic outlier ensemble. Int J Mach Learn Cybern 12(2):501–513. https://doi.org/10.1007/s13042-020-01183-7
Zhao H, Yu D (2021) A dynamic outlier ensemble for databases in wind tunnel experiments. In: 2021 33rd Chinese control and decision conference, IEEE, pp 2227–2231. https://doi.org/10.1109/CCDC52312.2021.9601433
Zhao Y, Nasrullah Z, Hryniewicki MK, Li Z (2019) LSCP: Locally selective combination in parallel outlier ensembles. In: Proceedings of the 2019 SIAM international conference on data mining, SIAM, pp 585–593. https://doi.org/10.1137/1.9781611975673.66
Acknowledgements
The authors would like to thank their colleagues from the machine learning group for discussions on this paper. Besides, the authors also appreciate Kangsheng Li and Zhiyu Liu for their support of language translation.
Funding
The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Contributions
Conceptualization: XG, BL; Methodology: SF, XG; Formal analysis and investigation: SF, XG; Writing—original draft preparation: SF; Writing—review and editing: XG, BX, XJ, ZH, GZ, XH; Supervision: XG.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Detailed Results of Different Ensemble Models in Terms of TPR and FPR
Detailed Experimental Results on Different Pseudo Outlier Labeling Methods
The number of pseudo outliers labeled by different methods are presented in Table 13. ASC and ASC-t% label more samples as pseudo outliers than other methods and the number of pseudo outliers labeled by them in each trial is unfixed (Tables 14, 15, 16 and 17).
Detailed Experimental Results on Different Base Classifier Selection Methods
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, S., Gao, X., Li, B. et al. Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models. Neural Process Lett 55, 3429–3470 (2023). https://doi.org/10.1007/s11063-022-11017-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-11017-y