Skip to main content

Balancing XAI with Privacy and Security Considerations

  • Conference paper
  • First Online:
Computer Security. ESORICS 2023 International Workshops (ESORICS 2023)

Abstract

The acceptability of AI decisions and the efficiency of AI-human interaction become particularly significant when AI is incorporated into Critical Infrastructures (CI). To achieve this, eXplainable AI (XAI) modules must be integrated into the AI workflow. However, by design, XAI reveals the inner workings of AI systems, posing potential risks for privacy leaks and enhanced adversarial attacks. In this literature review, we explore the complex interplay of explainability, privacy, and security within trustworthy AI, highlighting inherent trade-offs and challenges. Our research reveals that XAI leads to privacy leaks and increases susceptibility to adversarial attacks. We categorize our findings according to XAI taxonomy classes and provide a concise overview of the corresponding fundamental concepts. Furthermore, we discuss how XAI interacts with prevalent privacy defenses and addresses the unique requirements of the security domain. Our findings contribute to the growing literature on XAI in the realm of CI protection and beyond, paving the way for future research in the field of trustworthy AI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 308–318 (2016). https://doi.org/10.1145/2976749.2978318

  2. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  3. Aïvodji, U., Bolot, A., Gambs, S.: Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884 (2020)

  4. Alvarez Melis, D., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  5. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140

    Article  Google Scholar 

  6. Bhusal, D., Rastogi, N.: SoK: modeling explainability in security monitoring for trust, privacy, and interpretability. arXiv preprint arXiv:2210.17376 (2022)

  7. Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramèr, F.: Membership inference attacks from first principles. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914 (2022). https://doi.org/10.1109/SP46214.2022.9833649

  8. Carvalho, D., Pereira, E., Cardoso, J.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832

    Article  Google Scholar 

  9. Choquette-Choo, C.A., Tramer, F., Carlini, N., Papernot, N.: Label-only membership inference attacks. In: Proceedings of the 38th International Conference on Machine Learning, pp. 1964–1974. PMLR (2021)

    Google Scholar 

  10. Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617 (2016). https://doi.org/10.1109/SP.2016.42

  11. De La Torre Parra, G., Selvera, L., Khoury, J., Irizarry, H., Bou-Harb, E., Rad, P.: Interpretable federated transformer log learning for cloud threat forensics. In: Proceedings 2022 Network and Distributed System Security Symposium. Internet Society, San Diego, CA, USA (2022). https://doi.org/10.14722/ndss.2022.23102

  12. Dong, T., Li, S., Qiu, H., Lu, J.: An interpretable federated learning-based network intrusion detection framework. arXiv preprint arXiv:2201.03134 (2022)

  13. European Commission: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance) (2016). https://eur-lex.europa.eu/eli/reg/2016/679/oj

  14. Franco, D., Oneto, L., Navarin, N., Anguita, D.: Toward learning trustworthily from data combining privacy, fairness, and explainability: an application to face recognition. Entropy 23(8), 1047 (2021). https://doi.org/10.3390/e23081047

    Article  MathSciNet  Google Scholar 

  15. Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: LEMNA: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 364–379. CCS 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3243734.3243792

  16. Gürtler, M., Zöllner, M.: Tuning white box model with black box models: transparency in credit risk modeling. Available at SSRN 4433967 (2023)

    Google Scholar 

  17. High-Level Expert Group on AI: Ethics guidelines for trustworthy AI. Tech. rep., European Commission, Brussels (2019). https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

  18. ISO, IEC: ISO/IEC 27001:2022(en), Information security, cybersecurity and privacy protection — Information security management systems — Requirements (2022)

    Google Scholar 

  19. Izzo, Z., Yoon, J., Arik, S.O., Zou, J.: Provable membership inference privacy. In: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 (2022)

    Google Scholar 

  20. Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  21. Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2020)

    Google Scholar 

  22. Liu, X., et al.: Privacy and security issues in deep learning: a survey. IEEE Access 9, 4566–4593 (2021). https://doi.org/10.1109/ACCESS.2020.3045078

    Article  Google Scholar 

  23. Loyola-González, O.: Black-box vs. white-box: understanding their advantages and weaknesses from a practical point of view. IEEE Access 7, 154096–154113 (2019)

    Google Scholar 

  24. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  25. Malek-Podjaski, M., Deligianni, F.: Towards explainable, privacy-preserved human-motion affect recognition. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01–09 (2021). https://doi.org/10.1109/SSCI50451.2021.9660129

  26. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  27. Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction from model explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 1–9. ACM, Atlanta GA USA (2019). https://doi.org/10.1145/3287560.3287562

  28. Miura, T., Hasegawa, S., Shibahara, T.: MEGEX: data-free model extraction attack against gradient-based explainable AI. arXiv preprint arXiv:2107.08909 (2021)

  29. Nori, H., Caruana, R., Bu, Z., Shen, J.H., Kulkarni, J.: Accuracy, interpretability, and differential privacy via explainable boosting. In: Proceedings of the 38th International Conference on Machine Learning, pp. 8227–8237. PMLR (2021)

    Google Scholar 

  30. Oksuz, A.C., Halimi, A., Ayday, E.: Autolycus: exploiting explainable AI (XAI) for model extraction attacks against decision tree models. arXiv preprint arXiv:2302.02162 (2023)

  31. Patel, N., Shokri, R., Zick, Y.: Model explanations with differential privacy. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1895–1904. ACM, Seoul Republic of Korea (2022). https://doi.org/10.1145/3531146.3533235

  32. Petkovic, D.: It is not “Accuracy vs. Explainability”—we need both for trustworthy AI systems. IEEE Trans. Technol. Soc. 4(1), 46–53 (2023). https://doi.org/10.1109/TTS.2023.3239921

  33. Phong, L., Aono, Y., Hayashi, T., Wang, L., Moriai, S.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13(5), 1333–1345 (2018). https://doi.org/10.1109/TIFS.2017.2787987

    Article  Google Scholar 

  34. Raymond, A., Gunes, H., Prorok, A.: Culture-based explainable human-agent deconfliction. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1107–1115. AAMAS 2020, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2020)

    Google Scholar 

  35. Raymond, A., Malencia, M., Paulino-Passos, G., Prorok, A.: Agree to disagree: subjective fairness in privacy-restricted decentralised conflict resolution. Front. Robot. AI 9, 733876 (2022)

    Article  Google Scholar 

  36. Ribeiro, M., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 97–101. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-3020

  37. Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023). https://doi.org/10.1016/j.knosys.2023.110273

    Article  Google Scholar 

  38. Saifullah, S., Mercier, D., Lucieri, A., Dengel, A., Ahmed, S.: Privacy meets explainability: a comprehensive impact benchmark. arXiv preprint arXiv:2211.04110 (2022)

  39. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  40. Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 231–241. ACM, Virtual Event USA (2021). https://doi.org/10.1145/3461702.3462533

  41. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Proceedings of the International Conference on Learning Representations (ICLR). ICLR (2014)

    Google Scholar 

  42. Slack, D., Hilgard, S., Jia, E., Singh, S., Lakkaraju, H.: Fooling LIME and SHAP: adversarial attacks on post hoc explanation methods. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 180–186. ACM, New York NY USA (2020). https://doi.org/10.1145/3375627.3375830

  43. Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)

  44. Song, C., Shmatikov, V.: Overlearning reveals sensitive attributes. In: 8th International Conference on Learning Representations, ICLR 2020 (2020)

    Google Scholar 

  45. Song, Q., Lei, S., Sun, W., Zhang, Y.: Adaptive federated learning for digital twin driven industrial internet of things. In: IEEE Wireless Communications and Networking Conference, WCNC. vol. 2021-March (2021). https://doi.org/10.1109/WCNC49053.2021.9417370

  46. Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data – anonymisation groundhog day. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 1451–1468 (2022)

    Google Scholar 

  47. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  48. Truong, J.B., Maini, P., Walls, R.J., Papernot, N.: Data-free model extraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4771–4780 (2021)

    Google Scholar 

  49. Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31, 841 (2017)

    Google Scholar 

  50. Wahab, O.A., Mourad, A., Otrok, H., Taleb, T.: Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems. IEEE Commun. Surv. Tutorials 23(2), 1342–1397 (2021). https://doi.org/10.1109/COMST.2021.3058573

    Article  Google Scholar 

  51. Wainakh, A., Müßig, T., Grube, T., Mühlhäuser, M.: Label leakage from gradients in distributed machine learning. In: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), pp. 1–4 (2021). https://doi.org/10.1109/CCNC49032.2021.9369498

  52. Warnecke, A., Arp, D., Wressnegger, C., Rieck, K.: Evaluating explanation methods for deep learning in security. In: 2020 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 158–174 (2020). https://doi.org/10.1109/EuroSP48549.2020.00018

  53. Yan, A., Huang, T., Ke, L., Liu, X., Chen, Q., Dong, C.: Explanation leaks: explanation-guided model extraction attacks. Inf. Sci. 632, 269–284 (2023). https://doi.org/10.1016/j.ins.2023.03.020

    Article  Google Scholar 

  54. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–19 (2019). https://doi.org/10.1145/3298981

    Article  Google Scholar 

  55. Yin, H., Mallya, A., Vahdat, A., Alvarez, J.M., Kautz, J., Molchanov, P.: See through gradients: image batch recovery via GradInversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16337–16346 (2021)

    Google Scholar 

  56. Zhang, X., Wang, N., Shen, H., Ji, S., Luo, X., Wang, T.: Interpretable deep learning under fire. In: 29th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 20) (2020)

    Google Scholar 

  57. Zhao, B., Mopuri, K.R., Bilen, H.: iDLG: improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020)

  58. Zhao, X., Zhang, W., Xiao, X., Lim, B.: Exploiting explanations for model inversion attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–692 (2021)

    Google Scholar 

  59. Zhu, L., Han, S.: Deep leakage from gradients. In: Yang, Q., Fan, L., Yu, H. (eds.) Federated Learning. LNCS (LNAI), vol. 12500, pp. 17–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63076-8_2

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the EU funded project ATLANTIS (Grant Agreement Number 101073909).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theodoros Semertzidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spartalis, C.N., Semertzidis, T., Daras, P. (2024). Balancing XAI with Privacy and Security Considerations. In: Katsikas, S., et al. Computer Security. ESORICS 2023 International Workshops. ESORICS 2023. Lecture Notes in Computer Science, vol 14399. Springer, Cham. https://doi.org/10.1007/978-3-031-54129-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54129-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54128-5

  • Online ISBN: 978-3-031-54129-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics