Balancing XAI with Privacy and Security Considerations

Spartalis, Christoforos N.; Semertzidis, Theodoros; Daras, Petros

doi:10.1007/978-3-031-54129-2_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14399))

Included in the following conference series:

European Symposium on Research in Computer Security

172 Accesses

Abstract

The acceptability of AI decisions and the efficiency of AI-human interaction become particularly significant when AI is incorporated into Critical Infrastructures (CI). To achieve this, eXplainable AI (XAI) modules must be integrated into the AI workflow. However, by design, XAI reveals the inner workings of AI systems, posing potential risks for privacy leaks and enhanced adversarial attacks. In this literature review, we explore the complex interplay of explainability, privacy, and security within trustworthy AI, highlighting inherent trade-offs and challenges. Our research reveals that XAI leads to privacy leaks and increases susceptibility to adversarial attacks. We categorize our findings according to XAI taxonomy classes and provide a concise overview of the corresponding fundamental concepts. Furthermore, we discuss how XAI interacts with prevalent privacy defenses and addresses the unique requirements of the security domain. Our findings contribute to the growing literature on XAI in the realm of CI protection and beyond, paving the way for future research in the field of trustworthy AI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 308–318 (2016). https://doi.org/10.1145/2976749.2978318
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Aïvodji, U., Bolot, A., Gambs, S.: Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884 (2020)
Alvarez Melis, D., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140
Article Google Scholar
Bhusal, D., Rastogi, N.: SoK: modeling explainability in security monitoring for trust, privacy, and interpretability. arXiv preprint arXiv:2210.17376 (2022)
Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramèr, F.: Membership inference attacks from first principles. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914 (2022). https://doi.org/10.1109/SP46214.2022.9833649
Carvalho, D., Pereira, E., Cardoso, J.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832
Article Google Scholar
Choquette-Choo, C.A., Tramer, F., Carlini, N., Papernot, N.: Label-only membership inference attacks. In: Proceedings of the 38th International Conference on Machine Learning, pp. 1964–1974. PMLR (2021)
Google Scholar
Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617 (2016). https://doi.org/10.1109/SP.2016.42
De La Torre Parra, G., Selvera, L., Khoury, J., Irizarry, H., Bou-Harb, E., Rad, P.: Interpretable federated transformer log learning for cloud threat forensics. In: Proceedings 2022 Network and Distributed System Security Symposium. Internet Society, San Diego, CA, USA (2022). https://doi.org/10.14722/ndss.2022.23102
Dong, T., Li, S., Qiu, H., Lu, J.: An interpretable federated learning-based network intrusion detection framework. arXiv preprint arXiv:2201.03134 (2022)
European Commission: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance) (2016). https://eur-lex.europa.eu/eli/reg/2016/679/oj
Franco, D., Oneto, L., Navarin, N., Anguita, D.: Toward learning trustworthily from data combining privacy, fairness, and explainability: an application to face recognition. Entropy 23(8), 1047 (2021). https://doi.org/10.3390/e23081047
Article MathSciNet Google Scholar
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: LEMNA: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 364–379. CCS 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3243734.3243792
Gürtler, M., Zöllner, M.: Tuning white box model with black box models: transparency in credit risk modeling. Available at SSRN 4433967 (2023)
Google Scholar
High-Level Expert Group on AI: Ethics guidelines for trustworthy AI. Tech. rep., European Commission, Brussels (2019). https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai
ISO, IEC: ISO/IEC 27001:2022(en), Information security, cybersecurity and privacy protection — Information security management systems — Requirements (2022)
Google Scholar
Izzo, Z., Yoon, J., Arik, S.O., Zou, J.: Provable membership inference privacy. In: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 (2022)
Google Scholar
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2020)
Google Scholar
Liu, X., et al.: Privacy and security issues in deep learning: a survey. IEEE Access 9, 4566–4593 (2021). https://doi.org/10.1109/ACCESS.2020.3045078
Article Google Scholar
Loyola-González, O.: Black-box vs. white-box: understanding their advantages and weaknesses from a practical point of view. IEEE Access 7, 154096–154113 (2019)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Malek-Podjaski, M., Deligianni, F.: Towards explainable, privacy-preserved human-motion affect recognition. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01–09 (2021). https://doi.org/10.1109/SSCI50451.2021.9660129
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction from model explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 1–9. ACM, Atlanta GA USA (2019). https://doi.org/10.1145/3287560.3287562
Miura, T., Hasegawa, S., Shibahara, T.: MEGEX: data-free model extraction attack against gradient-based explainable AI. arXiv preprint arXiv:2107.08909 (2021)
Nori, H., Caruana, R., Bu, Z., Shen, J.H., Kulkarni, J.: Accuracy, interpretability, and differential privacy via explainable boosting. In: Proceedings of the 38th International Conference on Machine Learning, pp. 8227–8237. PMLR (2021)
Google Scholar
Oksuz, A.C., Halimi, A., Ayday, E.: Autolycus: exploiting explainable AI (XAI) for model extraction attacks against decision tree models. arXiv preprint arXiv:2302.02162 (2023)
Patel, N., Shokri, R., Zick, Y.: Model explanations with differential privacy. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1895–1904. ACM, Seoul Republic of Korea (2022). https://doi.org/10.1145/3531146.3533235
Petkovic, D.: It is not “Accuracy vs. Explainability”—we need both for trustworthy AI systems. IEEE Trans. Technol. Soc. 4(1), 46–53 (2023). https://doi.org/10.1109/TTS.2023.3239921
Phong, L., Aono, Y., Hayashi, T., Wang, L., Moriai, S.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13(5), 1333–1345 (2018). https://doi.org/10.1109/TIFS.2017.2787987
Article Google Scholar
Raymond, A., Gunes, H., Prorok, A.: Culture-based explainable human-agent deconfliction. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1107–1115. AAMAS 2020, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2020)
Google Scholar
Raymond, A., Malencia, M., Paulino-Passos, G., Prorok, A.: Agree to disagree: subjective fairness in privacy-restricted decentralised conflict resolution. Front. Robot. AI 9, 733876 (2022)
Article Google Scholar
Ribeiro, M., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 97–101. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-3020
Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023). https://doi.org/10.1016/j.knosys.2023.110273
Article Google Scholar
Saifullah, S., Mercier, D., Lucieri, A., Dengel, A., Ahmed, S.: Privacy meets explainability: a comprehensive impact benchmark. arXiv preprint arXiv:2211.04110 (2022)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 231–241. ACM, Virtual Event USA (2021). https://doi.org/10.1145/3461702.3462533
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Proceedings of the International Conference on Learning Representations (ICLR). ICLR (2014)
Google Scholar
Slack, D., Hilgard, S., Jia, E., Singh, S., Lakkaraju, H.: Fooling LIME and SHAP: adversarial attacks on post hoc explanation methods. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 180–186. ACM, New York NY USA (2020). https://doi.org/10.1145/3375627.3375830
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Song, C., Shmatikov, V.: Overlearning reveals sensitive attributes. In: 8th International Conference on Learning Representations, ICLR 2020 (2020)
Google Scholar
Song, Q., Lei, S., Sun, W., Zhang, Y.: Adaptive federated learning for digital twin driven industrial internet of things. In: IEEE Wireless Communications and Networking Conference, WCNC. vol. 2021-March (2021). https://doi.org/10.1109/WCNC49053.2021.9417370
Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data – anonymisation groundhog day. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 1451–1468 (2022)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Truong, J.B., Maini, P., Walls, R.J., Papernot, N.: Data-free model extraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4771–4780 (2021)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31, 841 (2017)
Google Scholar
Wahab, O.A., Mourad, A., Otrok, H., Taleb, T.: Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems. IEEE Commun. Surv. Tutorials 23(2), 1342–1397 (2021). https://doi.org/10.1109/COMST.2021.3058573
Article Google Scholar
Wainakh, A., Müßig, T., Grube, T., Mühlhäuser, M.: Label leakage from gradients in distributed machine learning. In: 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), pp. 1–4 (2021). https://doi.org/10.1109/CCNC49032.2021.9369498
Warnecke, A., Arp, D., Wressnegger, C., Rieck, K.: Evaluating explanation methods for deep learning in security. In: 2020 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 158–174 (2020). https://doi.org/10.1109/EuroSP48549.2020.00018
Yan, A., Huang, T., Ke, L., Liu, X., Chen, Q., Dong, C.: Explanation leaks: explanation-guided model extraction attacks. Inf. Sci. 632, 269–284 (2023). https://doi.org/10.1016/j.ins.2023.03.020
Article Google Scholar
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–19 (2019). https://doi.org/10.1145/3298981
Article Google Scholar
Yin, H., Mallya, A., Vahdat, A., Alvarez, J.M., Kautz, J., Molchanov, P.: See through gradients: image batch recovery via GradInversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16337–16346 (2021)
Google Scholar
Zhang, X., Wang, N., Shen, H., Ji, S., Luo, X., Wang, T.: Interpretable deep learning under fire. In: 29th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 20) (2020)
Google Scholar
Zhao, B., Mopuri, K.R., Bilen, H.: iDLG: improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020)
Zhao, X., Zhang, W., Xiao, X., Lim, B.: Exploiting explanations for model inversion attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 682–692 (2021)
Google Scholar
Zhu, L., Han, S.: Deep leakage from gradients. In: Yang, Q., Fan, L., Yu, H. (eds.) Federated Learning. LNCS (LNAI), vol. 12500, pp. 17–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63076-8_2
Chapter Google Scholar

Download references

Acknowledgments

This work was partially supported by the EU funded project ATLANTIS (Grant Agreement Number 101073909).

Author information

Authors and Affiliations

Center for Research and Technology Hellas, Information Technologies Institute, Thessaloniki, Greece
Christoforos N. Spartalis, Theodoros Semertzidis & Petros Daras

Authors

Christoforos N. Spartalis
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros Semertzidis
View author publications
You can also search for this author in PubMed Google Scholar
Petros Daras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theodoros Semertzidis .

Editor information

Editors and Affiliations

Norwegian University of Science and Technology, Gjøvik, Norway
Sokratis Katsikas
Norwegian Computing Center, Oslo, Norway
Habtamu Abie
University of Trento, Trento, Italy
Silvio Ranise
University of Genoa, Genoa, Italy
Luca Verderame
Consiglio Nazionale delle Ricerche (CNR), Genoa, Italy
Enrico Cambiaso
SINTEF A.S., Oslo, Norway
Rita Ugarelli
Instituto Superior de Engenharia do Porto, Porto, Portugal
Isabel Praça
Hong Kong Polytechnic University, Hong Kong, China
Wenjuan Li
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of Nottingham, Nottingham, UK
Steven Furnell
Norwegian University of Science and Technology, Gjøvik, Norway
Basel Katt
Norwegian Computing Center, Oslo, Norway
Sandeep Pirbhulal
Institute for Energy Technology (IFE), Halden, Norway
Ankur Shukla
University of Calabria, Rende, Italy
Michele Ianni
University of Verona, Verona, Italy
Mila Dalla Preda
The University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
University of Lisbon, Lisbon, Portugal
Miguel Pupo Correia
University of Twente, Enschede, The Netherlands
Abhishta Abhishta
University of Amsterdam, Amsterdam, The Netherlands
Giovanni Sileno
Open University in the Netherlands, Heerlen, The Netherlands
Mina Alishahi
Robert Gordon University, Aberdeen, UK
Harsha Kalutarage
Osaka University, Osaka, Japan
Naoto Yanai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spartalis, C.N., Semertzidis, T., Daras, P. (2024). Balancing XAI with Privacy and Security Considerations. In: Katsikas, S., et al. Computer Security. ESORICS 2023 International Workshops. ESORICS 2023. Lecture Notes in Computer Science, vol 14399. Springer, Cham. https://doi.org/10.1007/978-3-031-54129-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-54129-2_7
Published: 12 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54128-5
Online ISBN: 978-3-031-54129-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Balancing XAI with Privacy and Security Considerations