Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Fu, Shiyuan; Gao, Xin; Li, Baofeng; Xue, Bing; Jia, Xin; Huang, Zijian; Zhang, Guangyao; Huang, Xu

doi:10.1007/s11063-022-11017-y

Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Published: 01 September 2022

Volume 55, pages 3429–3470, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Shiyuan Fu¹,
Xin Gao ORCID: orcid.org/0000-0002-1183-7223¹,
Baofeng Li²,
Bing Xue¹,
Xin Jia¹,
Zijian Huang¹,
Guangyao Zhang¹ &
…
Xu Huang¹

185 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Semi-supervised anomaly detection has received wide interest because of not requiring counterexamples during training. Existing competence measures for semi-supervised dynamic ensemble anomaly detection models do not consider the imbalance characteristic of training samples, which will result in serious overfitting on normal samples. This paper proposes two outlier-sensitive measures to estimate the competence of base classifiers for dynamic ensemble models. When a normal sample is correctly classified, both measures give a higher positive score to base classifiers with confidence closer to 0.5, which is different from the conventional idea that base classifiers with higher confidence should obtain higher scores. When a sample is misclassified, the Output-based Outlier-Sensitive measure calculates a negative score based on the confidence outputted by the base classifier, while the Cost-Sensitive-based Outlier-Sensitive measure gives a negative score based on the category of this sample. Multiple experiments are carried out on 30 datasets from public repositories under the unified framework proposed in this paper, and results show that dynamic ensemble models with our competence measures can outperform a number of typical ensemble models in terms of G-mean and F1, regardless of the pseudo outlier labeling methods and base classifier selection methods used in the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using multiple classifier behavior to develop a dynamic outlier ensemble

Article 09 August 2020

Exploring Ensembles for Unsupervised Outlier Detection: An Empirical Analysis

Improving Imbalanced Classification by Anomaly Detection

Data Availability

The datasets supporting the results of this article are all from KEEL, ELKI and ODDS public databases.

Code Availability

Custom code.

Notes

For other researchers can better reproduce our experimental result: During implementation, we find that one of the base one-class classifiers KNN_DD (see Sect. 4.2) from ’dd_tools’ cannot return a reasonable value on training samples. In such case, we use a conversion method offered by ’dd_tools’ itself, which can directly normalize outputs for validation samples and test instances, (only) on KNN_DD.
It should be noted that for experiments conducted in Sects. 4 and 5, the negative sign in Eq. 2 is discarded so that a higher EM score indicates that this base classifier is more competent. This small modification does not affect its performance but can reduce confusion.
https://sci2s.ugr.es/keel/datasets.php.
https://elki-project.github.io/datasets/outlier.
http://odds.cs.stonybrook.edu/#table1.
http://homepage.tudelft.nl/n9d04/functions/Contents.html.

References

Aggarwal CC, Sathe S (2017) Outlier ensembles: an introduction. Springer. https://doi.org/10.1007/978-3-319-54765-7
Book Google Scholar
Anbarasi MS, Ghaayathri S, Kamaleswari R, Abirami I (2011) Outlier detection for multidimensional medical data. Int J Comput Sci Inf Technol 2(1):512–516
Google Scholar
Antosik B, Kurzynski M (2011) New measures of classifier competence-heuristics and application to the design of multiple classifier systems. In: Computer recognition systems 4. Springer, Berlin, pp 197–206. https://doi.org/10.1007/978-3-642-20320-6_21
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58. https://doi.org/10.1145/1541880.1541882
Article Google Scholar
Chen Y, Zhou XS, Huang TS (2001) One-class SVM for learning in image retrieval. In: Proceedings 2001 international conference on image processing, IEEE, pp 34–37. https://doi.org/10.1109/ICIP.2001.958946
Cohen G, Sax H, Geissbuhler A (2008) Novelty detection using one-class Parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inf 136:21–26
Google Scholar
Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216. https://doi.org/10.1016/j.inffus.2017.09.010
Article Google Scholar
Dewan I, Rao BLSP (2005) Wilcoxon-signed rank test for associated sequences. Stat Prob Lett 71(2):131–142. https://doi.org/10.1016/j.spl.2004.10.034
Article MathSciNet MATH Google Scholar
Désir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46(12):3490–3506. https://doi.org/10.1016/j.patcog.2013.05.022
Article Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on Artificial intelligence, Lawrence Erlbaum Associates Ltd, pp 973–978
Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Sixth international conference on data mining, IEEE, pp 212–221. https://doi.org/10.1109/ICDM.2006.43
García S, Zhang ZL, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445–446:22–37. https://doi.org/10.1016/j.ins.2018.03.002
Article MathSciNet Google Scholar
Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75. https://doi.org/10.1109/34.273716
Article Google Scholar
Huang D, Mu D, Yang L, Cai X (2018) CoDetect: Financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174. https://doi.org/10.1109/ACCESS.2018.2816564
Article Google Scholar
Khraisat A, Gondal I, Vamplew P, Kamruzzaman J, Alazab A (2020) Hybrid intrusion detection system based on the stacking ensemble of c5 decision tree classifier and one class support vector machine. Electronics. https://doi.org/10.3390/electronics9010173
Article Google Scholar
Ko AHR, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731. https://doi.org/10.1016/j.patcog.2007.10.015
Article MATH Google Scholar
Krawczyk B (2015) One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing 150:490–500. https://doi.org/10.1016/j.neucom.2014.07.068
Article Google Scholar
Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl Based Syst 107:43–53. https://doi.org/10.1016/j.knosys.2016.05.054
Article Google Scholar
Krawczyk B, Woźniak M, Cyganek B (2014) Clustering-based ensembles for one-class classification. Inf Sci 264:182–195. https://doi.org/10.1016/j.ins.2013.12.019
Article MathSciNet MATH Google Scholar
Lai C, Tax DMJ, Duin RPW, Pękalska E, Paclík P (2002) On combining one-class classifiers for image database retrieval. In: Multiple classifier systems. Springer, Berlin, pp 212–221. https://doi.org/10.1007/3-540-45428-4_21
Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481. https://doi.org/10.1016/j.neucom.2006.05.013
Article Google Scholar
Parhizkar E, Abadi M (2015) BeeOWA: A novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381. https://doi.org/10.1016/j.neucom.2015.03.051
Article Google Scholar
Partalas I, Tsoumakas G, Hatzikos EV, Vlahavas I (2008) Greedy regression ensemble selection: theory and an application to water quality prediction. Inf Sci 178(20):3867–3879. https://doi.org/10.1016/j.ins.2008.05.025
Article Google Scholar
Rayana S, Akoglu L (2016) Less is more: building selective anomaly ensembles. ACM Trans Knowl Discov Data 10(4):1–33. https://doi.org/10.1145/2890508
Article Google Scholar
Singh G, Masseglia F, Fiot C, Marascu A, Poncelet P (2010) Mining common outliers for intrusion detection. In: Advances in knowledge discovery and management. Springer, Berlin, pp 217–234. https://doi.org/10.1007/978-3-642-00580-0_13
Tax DMJ, Duin RPW (2001) Combining one-class classifiers. In: Multiple classifier systems. Springer, Berlin, pp 299–308. https://doi.org/10.1007/3-540-48219-9_30
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
Article MATH Google Scholar
Wang B, Mao Z (2018) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476. https://doi.org/10.1177/0142331217724508
Article Google Scholar
Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inf Fusion 51:244–258. https://doi.org/10.1016/j.inffus.2019.02.006
Article Google Scholar
Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule. Inf Fusion 63:30–40. https://doi.org/10.1016/j.inffus.2020.05.001
Article Google Scholar
Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000. https://doi.org/10.1109/ACCESS.2019.2932769
Article Google Scholar
Yuan P, Wang B, Mao Z (2021) Using multiple classifier behavior to develop a dynamic outlier ensemble. Int J Mach Learn Cybern 12(2):501–513. https://doi.org/10.1007/s13042-020-01183-7
Article Google Scholar
Zhao H, Yu D (2021) A dynamic outlier ensemble for databases in wind tunnel experiments. In: 2021 33rd Chinese control and decision conference, IEEE, pp 2227–2231. https://doi.org/10.1109/CCDC52312.2021.9601433
Zhao Y, Nasrullah Z, Hryniewicki MK, Li Z (2019) LSCP: Locally selective combination in parallel outlier ensembles. In: Proceedings of the 2019 SIAM international conference on data mining, SIAM, pp 585–593. https://doi.org/10.1137/1.9781611975673.66

Download references

Acknowledgements

The authors would like to thank their colleagues from the machine learning group for discussions on this paper. Besides, the authors also appreciate Kangsheng Li and Zhiyu Liu for their support of language translation.

Funding

The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Shiyuan Fu, Xin Gao, Bing Xue, Xin Jia, Zijian Huang, Guangyao Zhang & Xu Huang
China Electric Power Research Institute Company Limited, Beijing, 100192, China
Baofeng Li

Authors

Shiyuan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Baofeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zijian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Guangyao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: XG, BL; Methodology: SF, XG; Formal analysis and investigation: SF, XG; Writing—original draft preparation: SF; Writing—review and editing: XG, BX, XJ, ZH, GZ, XH; Supervision: XG.

Corresponding author

Correspondence to Xin Gao.

Ethics declarations

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Detailed Results of Different Ensemble Models in Terms of TPR and FPR

See Tables 11 and 12.

Table 11 Detailed results of different ensemble models in terms of TPR

Full size table

Table 12 Detailed results of different ensemble models in terms of FPR

Full size table

Detailed Experimental Results on Different Pseudo Outlier Labeling Methods

The number of pseudo outliers labeled by different methods are presented in Table 13. ASC and ASC-t% label more samples as pseudo outliers than other methods and the number of pseudo outliers labeled by them in each trial is unfixed (Tables 14, 15, 16 and 17).

Table 13 The number of pseudo outliers labeled by different methods

Full size table

Table 14 Detailed results of dynamic ensemble models with EM and different pseudo outlier labeling methods in terms of G-mean

Full size table

Table 15 Detailed results of dynamic ensemble models with EM and different pseudo outlier labeling methods in terms of F1

Full size table

Table 16 Detailed results of dynamic ensemble models with OOS and different pseudo outlier labeling methods in terms of G-mean

Full size table

Table 17 Detailed results of dynamic ensemble models with OOS and different pseudo outlier labeling methods in terms of F1

Full size table

Detailed Experimental Results on Different Base Classifier Selection Methods

See Tables 18, 19, 20, 21, 22 and 23.

Table 18 Detailed results of dynamic ensemble models with EM and different base classifier selection methods in terms of G-mean

Full size table

Table 19 Detailed results of dynamic ensemble models with EM and different base classifier selection methods in terms of F1

Full size table

Table 20 The number of base classifiers selected by different methods when EM is used

Full size table

Table 21 Detailed results of dynamic ensemble models with OOS and different base classifier selection methods in terms of G-mean

Full size table

Table 22 Detailed results of dynamic ensemble models with OOS and different base classifier selection methods in terms of F1

Full size table

Table 23 The number of base classifiers selected by different methods when OOS is used

Full size table

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fu, S., Gao, X., Li, B. et al. Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models. Neural Process Lett 55, 3429–3470 (2023). https://doi.org/10.1007/s11063-022-11017-y

Download citation

Accepted: 17 August 2022
Published: 01 September 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11063-022-11017-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Abstract

Access this article

Similar content being viewed by others

Using multiple classifier behavior to develop a dynamic outlier ensemble

Exploring Ensembles for Unsupervised Outlier Detection: An Empirical Analysis

Improving Imbalanced Classification by Anomaly Detection

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Appendices

Appendix

Detailed Results of Different Ensemble Models in Terms of TPR and FPR

Detailed Experimental Results on Different Pseudo Outlier Labeling Methods

Detailed Experimental Results on Different Base Classifier Selection Methods

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models

Abstract

Access this article

Similar content being viewed by others

Using multiple classifier behavior to develop a dynamic outlier ensemble

Exploring Ensembles for Unsupervised Outlier Detection: An Empirical Analysis

Improving Imbalanced Classification by Anomaly Detection

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Appendices

Appendix

Detailed Results of Different Ensemble Models in Terms of TPR and FPR

Detailed Experimental Results on Different Pseudo Outlier Labeling Methods

Detailed Experimental Results on Different Base Classifier Selection Methods

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation