Skip to main content

On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Abstract

Feature bagging models have revealed their practical usability in various contexts, among them in outlier detection, where they build ensembles to reliably assign outlier scores to data samples. However, the interpretability of so-obtained outlier detection methods is far from achieved. Among the standard black-box models interpretability approaches, we find Shapley values that clarify the roles of single inputs. However, Shapley values are characterized by high computational runtimes that make them useful in pretty low-dimensional applications. We propose bagged Shapley values, a method to achieve interpretability of feature bagging ensembles, especially for outlier detection. The method not only assigns local importance scores to each feature of the initial space, helping to increase the interpretability but also solves the computational issue; specifically, the bagged Shapley values can be exactly computed in polynomial time.

This work was supported by the Lamarr-Institute for ML and AI, the research training group Dataninja, the Research Center Trustworthy Data Science and Security, the Federal Ministry of Education and Research of Germany and the German federal state of NRW. The Linux HPC cluster at TU Dortmund University, a project of the German Research Foundation, provided the computing power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/KDD-OpenSource/ensemble_shapley.

  2. 2.

    All experiments were performed on Intel Xeon E5 CPUs. In the paper, we stick to CPUs over GPUs also when we use neural network submodels; the choice is justified by the higher amount of parallelization they allow.

  3. 3.

    The isolation forest takes about \(220\text {min}\) of CPU time. DEAN requires about \(113\text {days}\); However, the independent ensembles are easy to parallelize, and less accurate results can already be achieved with ten thousand submodels (\(27 \text {hours}\)).

References

  1. Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Mach. Learn. 24, 173–202 (1996)

    Article  Google Scholar 

  2. Balestra, C., Li, B., Müller, E.: slidshaps - sliding shapley values for correlation-based change detection in time series. In: DSAA (2023)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    Article  Google Scholar 

  4. Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: IJCAI (2021)

    Google Scholar 

  5. Böing, B., Klüttermann, S., Müller, E.: Post-robustifying deep anomaly detection ensembles by model selection. In: ICDM (2022)

    Google Scholar 

  6. van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8, 1–12 (2018)

    Google Scholar 

  7. Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)

    Article  MathSciNet  Google Scholar 

  8. Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012)

    Article  Google Scholar 

  9. Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Ghaemmaghami, H., Fookes, C.: A robust interpretable deep learning classifier for heart anomaly detection without segmentation. IEEE J. Biomed. Health Inform. 25, 2162–2171 (2021)

    Article  Google Scholar 

  10. Dong, L., Shulin, L., Zhang, H.: A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples. Pattern Recogn. 64, 374–385 (2017)

    Article  Google Scholar 

  11. Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. In: NeurIPS (2022)

    Google Scholar 

  12. Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst. Appl. 193, 116429 (2022)

    Article  Google Scholar 

  13. Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)

    Article  Google Scholar 

  14. Klüttermann, S., Müller, E.: Evaluating and comparing heterogeneous ensemble methods for unsupervised anomaly detection. In: IJCNN (2023)

    Google Scholar 

  15. Li, Z., Zhu, Y., Van Leeuwen, M.: A survey on explainable anomaly detection. ACM Trans. Knowl. Discovery Data 18, 1–54 (2023)

    Google Scholar 

  16. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM (2008)

    Google Scholar 

  17. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)

    Google Scholar 

  18. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  19. Müller, E., Keller, F., Blanc, S., Böhm, K.: Outrules: a framework for outlier descriptions in multiple context spaces. In: ECML PKDD (2012)

    Google Scholar 

  20. Park, C.H., Kim, J.: An explainable outlier detection method using region-partition trees. J. Supercomput. 77, 3062–3076 (2021)

    Article  Google Scholar 

  21. Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: KDD (2016)

    Google Scholar 

  22. Ruff, L., et al.: Deep one-class classification. In: ICML (2018)

    Google Scholar 

  23. Sandim, M.O.: Using Stacked Generalization for Anomaly Detection. Ph.D. thesis

    Google Scholar 

  24. Schapire, R.E., et al.: A brief introduction to boosting. In: IJCAI (1999)

    Google Scholar 

  25. Shapley, L.S.: A value for n-person games. Contributions to the Theory of Games (1953)

    Google Scholar 

  26. Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)

    MathSciNet  Google Scholar 

  27. Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)

    Article  Google Scholar 

  28. Takahashi, T., Ishiyama, R.: FIBAR: fingerprint imaging by binary angular reflection for individual identification of metal parts. In: EST (2014)

    Google Scholar 

  29. Tallón-Ballesteros, A., Chen, C.: Explainable AI: using shapley value to explain complex anomaly detection ml-based systems. Mach. Learn. Artif. Intell. 332, 152 (2020)

    Google Scholar 

  30. Triguero, I., et al.: Keel 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1238–1249 (2017)

    Article  Google Scholar 

  31. Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Expl. Newslet. 15, 11–22 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Klüttermann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Klüttermann, S., Balestra, C., Müller, E. (2024). On the Efficient Explanation of Outlier Detection Ensembles Through Shapley Values. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2259-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2261-7

  • Online ISBN: 978-981-97-2259-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics