research-article

GraphLeak: Patient Record Leakage through Gradients with Knowledge Graph

Authors:

Xi Sheryl Zhang,

Yefeng ZhengAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4706 - 4716

https://doi.org/10.1145/3589334.3648157

Published: 13 May 2024 Publication History

Abstract

In real clinics, the medical data are scattered over multiple hospitals. Due to security and privacy concerns, it is almost impossible to gather all the data together and train a unified model. Therefore, multi-node machine learning systems are currently the mainstream form of model training in healthcare systems. Nevertheless, distributed training relies on the exchange of gradients, which has been proved under the risk of privacy leakage. That means malicious attackers can restore the user's sensitive data by utilizing the publicly shared gradients, which is a serious problem for extremely private data such as Electronic Healthcare Records (EHRs). The performance of the previous gradient attack method will drop rapidly when the batch size of training data increases, which makes it less threatening in practice. However, in this paper, we found in the medical domain, by leveraging prior knowledge like the medical knowledge graph, the leakage risk can be significantly amplified. In particular, we present GraphLeak, which incorporates the medical knowledge graph in gradient leakage attacks. GraphLeak can improve the restoration effect of gradient attacks even under large batches of data. We conduct experimental verification on electronic healthcare record datasets, including eICU and MIMIC-III. Our method has achieved state-of-the-art attack performance compared with previous works. Code is available at https://github.com/anonymous4ai/GraphLeak.

Supplemental Material

MP4 File

Supplemental video

Download
57.96 MB

References

[1]

Mislav Balunovic, Dimitar Dimitrov, Nikola Jovanovic, and Martin Vechev. 2022. Lamp: Extracting text from gradients with language model priors. Advances in Neural Information Processing Systems 35 (2022), 7641--7654.

[2]

John F Banzhaf III. 1964. Weighted voting doesn't work: A mathematical analysis. Rutgers L. Rev. 19 (1964), 317.

[3]

Alissa Brauneck, Louisa Schmalhorst, Mohammad Mahdi Kazemi Majdabadi, Mohammad Bakhtiari, Uwe Völker, Jan Baumbach, Linda Baumbach, and Gabriele Buchholtz. 2023. Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: Scoping review. Journal of Medical Internet Research 25 (2023), e41588.

[4]

Sen Cui, Jian Liang, Weishen Pan, Kun Chen, Changshui Zhang, and Fei Wang. 2022. Collaboration equilibrium in federated learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 241--251.

Digital Library

[5]

Jieren Deng, Yijue Wang, Ji Li, Chenghong Wang, Chao Shang, Hang Liu, Sanguthevar Rajasekaran, and Caiwen Ding. 2021. TAG: Gradient Attack on Transformer-based Language Models. In The 2021 Conference on Empirical Methods in Natural Language Processing.

[6]

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems 33 (2020), 16937--16947.

[7]

Michel Grabisch and Marc Roubens. 1999. An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of game theory 28 (1999), 547--565.

[8]

Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. 2022. Recovering Private Text in Federated Learning of Language Models. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.).

[9]

Ali Hatamizadeh, Hongxu Yin, Pavlo Molchanov, Andriy Myronenko, Wenqi Li, Prerna Dogra, Andrew Feng, Mona Flores, Jan Kautz, Daguang Xu, et al . 2021. Towards Understanding the Risks of Gradient Inversion in Federated Learning. (2021).

[10]

Ali Hatamizadeh, Hongxu Yin, Pavlo Molchanov, Andriy Myronenko, Wenqi Li, Prerna Dogra, Andrew Feng, Mona G Flores, Jan Kautz, Daguang Xu, et al. 2023. Do gradient inversion attacks make federated learning unsafe? IEEE Transactions on Medical Imaging (2023).

[11]

Ali Hatamizadeh, Hongxu Yin, Holger R Roth, Wenqi Li, Jan Kautz, Daguang Xu, and Pavlo Molchanov. 2022. Gradvit: Gradient inversion of vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10021--10030.

[12]

Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, and Sanjeev Arora. 2021. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems 34 (2021), 7232--7241.

[13]

Jinwoo Jeon, Kangwook Lee, Sewoong Oh, Jungseul Ok, et al . 2021. Gradient inversion with generative image prior. Advances in neural information processing systems 34 (2021), 29898--29908.

[14]

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019).

[15]

Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. 2021. CAFE: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34 (2021), 994--1006.

[16]

Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3, 1 (2016), 1--9.

[17]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.

[18]

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine 37, 3 (2020), 50--60.

[19]

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems 2 (2020), 429--450.

[20]

Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. 2021. Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13753--13762.

[21]

Fenglin Liu, Chenyu You, Xian Wu, Shen Ge, Sheng wang, and Xu Sun. 2021. Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 16266--16279. https://proceedings.neurips.cc/paper_files/paper/2021/file/876e1c59023b1a0e95808168e1a8ff89-Paper.pdf

[22]

Jiahao Lu, Xi Sheryl Zhang, Tianli Zhao, Xiangyu He, and Jian Cheng. 2022. APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10051--10060.

[23]

Kailang Ma, Yu Sun, Jian Cui, Dawei Li, Zhenyu Guan, and Jianwei Liu. 2023. Instance-wise Batch Label Restoration via Gradients in Federated Learning. In The Eleventh International Conference on Learning Representations.

[24]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR, 1273--1282.

[25]

John X Morris, Volodymyr Kuleshov, Vitaly Shmatikov, and Alexander M Rush. 2023. Text Embeddings Reveal (Almost) As Much As Text. arXiv preprint arXiv:2310.06816 (2023).

[26]

Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. 2018. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientific data 5, 1 (2018), 1--13.

[27]

Zhi Qiao, Xian Wu, Shen Ge, and Wei Fan. 2019. MNN: Multimodal Attentional Neural Networks for Diagnosis Prediction. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 5937--5943. https://doi.org/10.24963/ijcai.2019/823

[28]

Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, and Jimeng Sun. 2019. Gamenet: Graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1126--1133.

Digital Library

[29]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[30]

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2023. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668 (2023).

[31]

Rui Wu, Zhaopeng Qiu, Jiacheng Jiang, Guilin Qi, and Xian Wu. 2022. Conditional Generation Net for Medication Recommendation. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW '22). Association for Computing Machinery, New York, NY, USA, 935--945. https://doi.org/10.1145/3485447.3511936

Digital Library

[32]

Rui Wu, Zhaopeng Qiu, Jiacheng Jiang, Guilin Qi, and Xian Wu. 2022. Conditional generation net for medication recommendation. In Proceedings of the ACM Web Conference 2022. 935--945.

Digital Library

[33]

Jie Xu, Benjamin S Glicksberg, Chang Su, Peter Walker, Jiang Bian, and Fei Wang. 2021. Federated learning for healthcare informatics. Journal of Healthcare Informatics Research 5 (2021), 1--19.

[34]

Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, and Pavlo Molchanov. 2021. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16337--16346.

[35]

Rui Zhang, Song Guo, Junxiao Wang, Xin Xie, and Dacheng Tao. 2022. A Survey on Gradient Inversion: Attacks, Defenses and Future Directions. arXiv preprint arXiv:2206.07284 (2022).

[36]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020).

[37]

Junyi Zhu and Matthew B Blaschko. 2021. R-GAP: Recursive Gradient Attack on Privacy. In International Conference on Learning Representations.

[38]

Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32.

Index Terms

GraphLeak: Patient Record Leakage through Gradients with Knowledge Graph
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
  2. Machine learning
    1. Machine learning algorithms
      1. Regularization
    2. Machine learning approaches
      1. Neural networks

Recommendations

Implementing the lifelong personal health record in a regionalised health information system: The case of Lombardy, Italy
Abstract Background
The use of personal health records (PHRs) can help people make better health decisions and improves the quality of care by allowing access to and use of the information needed to communicate effectively with ...
Using electronic health record systems in diabetes care: emerging practices
IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium

While there has been considerable attention devoted to the deployment of electronic health record (EHR) systems, there has been far less attention given to their appropriation for use in clinical encounters --- particularly in the context of complex, ...
Development and validation of a continuous measure of patient condition using the Electronic Medical Record

Graphical abstractDisplay Omitted New method to estimate patient condition during a hospital visit.Patient condition is computed by summing risks measured in each of 26 variables.Leverages data already in the EMR: vital signs, lab results, nursing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CCF-Tencent Open Research Fund

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
177
Total Downloads

Downloads (Last 12 months)177
Downloads (Last 6 weeks)13

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten