On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values

Klüttermann, Simon; Balestra, Chiara; Müller, Emmanuel

doi:10.1007/978-981-97-2259-4_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14647))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

127 Accesses

Abstract

Feature bagging models have revealed their practical usability in various contexts, among them in outlier detection, where they build ensembles to reliably assign outlier scores to data samples. However, the interpretability of so-obtained outlier detection methods is far from achieved. Among the standard black-box models interpretability approaches, we find Shapley values that clarify the roles of single inputs. However, Shapley values are characterized by high computational runtimes that make them useful in pretty low-dimensional applications. We propose bagged Shapley values, a method to achieve interpretability of feature bagging ensembles, especially for outlier detection. The method not only assigns local importance scores to each feature of the initial space, helping to increase the interpretability but also solves the computational issue; specifically, the bagged Shapley values can be exactly computed in polynomial time.

This work was supported by the Lamarr-Institute for ML and AI, the research training group Dataninja, the Research Center Trustworthy Data Science and Security, the Federal Ministry of Education and Research of Germany and the German federal state of NRW. The Linux HPC cluster at TU Dortmund University, a project of the German Research Foundation, provided the computing power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/KDD-OpenSource/ensemble_shapley.
2.
All experiments were performed on Intel Xeon E5 CPUs. In the paper, we stick to CPUs over GPUs also when we use neural network submodels; the choice is justified by the higher amount of parallelization they allow.
3.
The isolation forest takes about \(220\text {min}\) of CPU time. DEAN requires about \(113\text {days}\); However, the independent ensembles are easy to parallelize, and less accurate results can already be achieved with ten thousand submodels (\(27 \text {hours}\)).

References

Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Mach. Learn. 24, 173–202 (1996)
Article Google Scholar
Balestra, C., Li, B., Müller, E.: slidshaps - sliding shapley values for correlation-based change detection in time series. In: DSAA (2023)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Article Google Scholar
Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: IJCAI (2021)
Google Scholar
Böing, B., Klüttermann, S., Müller, E.: Post-robustifying deep anomaly detection ensembles by model selection. In: ICDM (2022)
Google Scholar
van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8, 1–12 (2018)
Google Scholar
Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)
Article MathSciNet Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012)
Article Google Scholar
Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Ghaemmaghami, H., Fookes, C.: A robust interpretable deep learning classifier for heart anomaly detection without segmentation. IEEE J. Biomed. Health Inform. 25, 2162–2171 (2021)
Article Google Scholar
Dong, L., Shulin, L., Zhang, H.: A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples. Pattern Recogn. 64, 374–385 (2017)
Article Google Scholar
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. In: NeurIPS (2022)
Google Scholar
Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst. Appl. 193, 116429 (2022)
Article Google Scholar
Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)
Article Google Scholar
Klüttermann, S., Müller, E.: Evaluating and comparing heterogeneous ensemble methods for unsupervised anomaly detection. In: IJCNN (2023)
Google Scholar
Li, Z., Zhu, Y., Van Leeuwen, M.: A survey on explainable anomaly detection. ACM Trans. Knowl. Discovery Data 18, 1–54 (2023)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM (2008)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Müller, E., Keller, F., Blanc, S., Böhm, K.: Outrules: a framework for outlier descriptions in multiple context spaces. In: ECML PKDD (2012)
Google Scholar
Park, C.H., Kim, J.: An explainable outlier detection method using region-partition trees. J. Supercomput. 77, 3062–3076 (2021)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: KDD (2016)
Google Scholar
Ruff, L., et al.: Deep one-class classification. In: ICML (2018)
Google Scholar
Sandim, M.O.: Using Stacked Generalization for Anomaly Detection. Ph.D. thesis
Google Scholar
Schapire, R.E., et al.: A brief introduction to boosting. In: IJCAI (1999)
Google Scholar
Shapley, L.S.: A value for n-person games. Contributions to the Theory of Games (1953)
Google Scholar
Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
MathSciNet Google Scholar
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)
Article Google Scholar
Takahashi, T., Ishiyama, R.: FIBAR: fingerprint imaging by binary angular reflection for individual identification of metal parts. In: EST (2014)
Google Scholar
Tallón-Ballesteros, A., Chen, C.: Explainable AI: using shapley value to explain complex anomaly detection ml-based systems. Mach. Learn. Artif. Intell. 332, 152 (2020)
Google Scholar
Triguero, I., et al.: Keel 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)
Article Google Scholar
Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Expl. Newslet. 15, 11–22 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TU Dortmund University, Dortmund, Germany
Simon Klüttermann, Chiara Balestra & Emmanuel Müller

Authors

Simon Klüttermann
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Balestra
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Klüttermann .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klüttermann, S., Balestra, C., Müller, E. (2024). On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-2259-4_4
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values