skip to main content
research-article

Learning Privacy-Preserving Embeddings for Image Data to Be Published

Published: 14 November 2023 Publication History

Abstract

Deep learning shows superiority in learning feature representations that offer promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender, and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to releasing the embeddings rather than the original images. In this article, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keeping the performance of the main task, such as disease prediction) and to simultaneously prevent the private attributes of data instances from being accurately inferred. We also want the hard embeddings to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is twofold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-ray datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also shows the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.

References

[1]
Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security. 308–318.
[2]
Tanbir Ahmed, Md Momin Al Aziz, Noman Mohammed, and Xiaoqian Jiang. 2021. Privacy preserving neural networks for electronic health records de-identification. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB’21). Article 8, 6 pages.
[3]
Giuseppe Ateniese, Giovanni Felici, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, and Domenico Vitali. 2015. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. In International Journal of Security and Networks, Vol. 10. 137–150.
[4]
Sheikh Shams Azam, Taejin Kim, Seyyedali Hosseinalipour, Carlee Joe-Wong, Saurabh Bagchi, and Christopher Brinton. 2022. Can we generalize and distribute private representation learning?. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. 11320–11340.
[5]
Ghazaleh Beigi, Ahmadreza Mosallanezhad, Ruocheng Guo, Hamidreza Alvari, Alexander Nou, and Huan Liu. 2020. Privacy-aware recommendation with private-attribute protection using adversarial learning. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM’20). 34–42.
[6]
Hsin-Yu Chen and Cheng-Te Li. 2022. Predicting and analyzing privacy settings and categories for posts on social media. In 2022 IEEE International Conference on Big Data (Big Data). 5692–5697.
[7]
Jiawei Chen, Janusz Konrad, and Prakash Ishwar. 2018. VGAN-based image representation learning for privacy-preserving facial expression recognition. In Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[8]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 787–795.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.
[10]
Xiaofeng Ding, Hongbiao Fang, Zhilin Zhang, Kim-Kwang Raymond Choo, and Hai Jin. 2022. Privacy-preserving feature extraction via adversarial training. IEEE Transactions on Knowledge and Data Engineering 34, 4 (2022), 1967–1979.
[11]
Cynthia Dwork. 2006. Differential privacy. In The 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006)(Lecture Notes in Computer Science, Vol. 4052). 1–12.
[12]
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 1322–1333.
[13]
Yaroslav Ganin and Victor S. Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In the 32nd International Conference on Machine Learning.
[14]
Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Trivedi, Ryan Wang, Zachary Zaiman, and Haoran Zhang. 2022. AI recognition of patient race in medical imaging: A modelling study. The Lancet Digital Health 4, 6 (Jun2022), e406–e414.
[15]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[17]
Briland Hitaj, Giuseppe Ateniese, and Fernando Pérez-Cruz. 2017. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA). 603–618.
[18]
I-Chung Hsieh and Cheng-Te Li. 2023. NetFense: Adversarial defenses against privacy attacks on neural networks for graph data. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2023), 796–809.
[19]
Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2019. Towards a more reliable privacy-preserving recommender system. Information Sciences 482 (2019), 248–265.
[20]
Georgios Kaissis, Alexander Ziller, Jonathan Passerat-Palmbach, Théo Ryffel, Dmitrii Usynin, Andrew Trask, Ionésio Lima, Jason Mancuso, Friederike Jungmann, Marc-Matthias Steinborn, Andreas Saleh, Marcus Makowski, Daniel Rueckert, and Rickmer Braren. 2021. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence 3 (062021), 1–12.
[21]
Bach Ngoc Kim, Jose Dolz, Pierre-Marc Jodoin, and Christian Desrosiers. 2021. Privacy-Net: An adversarial approach for identity-obfuscated segmentation of medical images. IEEE Transactions on Medical Imaging 40, 7 (2021), 1737–1749.
[22]
Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. SplitNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 1866–1874.
[23]
Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805.
[24]
Chih-Te Lai, Cheng-Te Li, and Shou-De Lin. 2020. Deep energy factorization model for demographic prediction. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 1 (2020), 1–16.
[25]
Ang Li, Jiayi Guo, Huanrui Yang, Flora D. Salim, and Yiran Chen. 2021. DeepObfuscator: Obfuscating intermediate representations with privacy-preserving adversarial learning on smartphones. In Proceedings of the International Conference on Internet-of-Things Design and Implementation (IoTDI’21). 28–39.
[26]
Tsung-Hsien Lin, Ying-Shuo Lee, Fu-Chieh Chang, J. Morris Chang, and Pei-Yuan Wu. 2023. Protecting sensitive attributes by adversarial training through class-overlapping techniques. IEEE Transactions on Information Forensics and Security 18 (2023), 1283–1294.
[27]
Yang Liu, Zhaowen Wang, Hailin Jin, and Ian Wassell. 2018. Multi-task adversarial network for disentangled feature learning. In The IEEE Conference on Computer Vision and Pattern Recognition.
[28]
Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, and Kai Chen. 2018. Understanding membership inferences on well-generalized learning models. abs/1802.04889 (2018).
[29]
Lingjuan Lyu, Jiangshan Yu, Karthik Nandakumar, Yitong Li, Xingjun Ma, Jiong Jin, Han Yu, and Kee Siong Ng. 2020. Towards fair and privacy-preserving federated deep models. IEEE Transactions on Parallel and Distributed Systems 31, 11 (2020), 2524–2541.
[30]
Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. 2021. Adversarial machine learning in image classification: A survey toward the defender’s perspective. ACM Comput. Surv. 55, 1, Article 8 (Nov2021).
[31]
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2018. Inference attacks against collaborative learning. abs/1805.04049 (2018).
[32]
Riccardo Miotto, Li Li, and Brian Kidd. 2016. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports 6 (052016), 26094.
[33]
National Institutes of Health (NIH). 2017. Random Sample of NIH Chest X-ray Dataset. Retrieved April 16, 2019 from https://www.kaggle.com/nih-chest-xrays/sample
[34]
Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco G. B. De Natale. 2017. Deep learning for mobile multimedia: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3s, Article 34 (Jun2017).
[35]
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2017. Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the International Conference on Learning Representations.
[36]
Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable private learning with PATE. In Proceedings of the International Conference on Learning Representations.
[37]
NhatHai Phan, Yue Wang, Xintao Wu, and Dejing Dou. 2016. Differential privacy preservation for deep auto-encoders: An application of human behavior prediction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 1309–1316.
[38]
Francesco Pittaluga, Sanjeev J. Koppal, and Ayan Chakrabarti. 2019. Learning privacy preserving encodings through adversarial training. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (2019), 791–799.
[39]
Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, Yun Liu, Michael V. McConnell, Gregory S. Corrado, Lily H. Peng, and Dale R. Webster. 2018. Predicting cardiovascular risk factors from retinal fundus photographs using deep learning. In Nature Biomedical Engineering.
[40]
Proteek Chandan Roy and Vishnu Naresh Boddeti. 2019. Mitigating information leakage in image representations: A maximum entropy approach. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2581–2589.
[41]
August DuMont Schütte, Jürgen Hetzel, Sergios Gatidis, Tobias Hepp, Benedikt Dietz, Stefan Bauer, and Patrick Schwab. 2021. Overcoming barriers to data sharing with medical image generation: A comprehensive evaluation. NPJ Digital Medicine 4, 141 (2021).
[42]
Cristina Segalin, Fabio Celli, Luca Polonio, Michal Kosinski, David Stillwell, Nicu Sebe, Marco Cristani, and Bruno Lepri. 2017. What your Facebook profile picture reveals about your personality. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 460–468.
[43]
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA). 1310–1321.
[44]
Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. In Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security. 587–601.
[45]
Congzheng Song and Vitaly Shmatikov. 2020. Overlearning reveals sensitive attributes. In International Conference on Learning Representations.
[46]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction APIs. In Proceedings of the 25th USENIX Conference on Security Symposium. 601–618.
[47]
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 2017. ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2097–2106.
[48]
Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, and Ming-Hsuan Yang. 2020. Adversarial learning of privacy-preserving and task-oriented representations. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 12434–12441.
[49]
Chugui Xu, Ju Ren, Deyu Zhang, Yaoxue Zhang, Zhan Qin, and Kui Ren. 2019. GANobfuscator: Mitigating information leakage under GAN via differential privacy. IEEE Transactions on Information Forensics and Security 14, 9 (2019), 2358–2371.
[50]
Chenyu You, Linfeng Yang, Yi Zhang, and Ge Wang. 2019. Low-dose CT via deep CNN with skip connection and network-in-network. In Developments in X-Ray Tomography XII, Vol. 11113. SPIE, 429–434.
[51]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Benjamin, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In Proceedings of the International Conference on Learning Representations.
[52]
Likun Zhang, Yahong Chen, Ang Li, Binghui Wang, Yiran Chen, Fenghua Li, Jin Cao, and Ben Niu. 2023. Interpreting disparate privacy-utility tradeoff in adversarial learning via attribute correlation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 4701–4709.
[53]
Yang Zhang, Mathias Humbert, Tahleen Rahman, Cheng-Te Li, Jun Pang, and Michael Backes. 2018. Tagvisor: A privacy advisor for sharing hashtags. In Proceedings of the 2018 World Wide Web Conference (WWW’18). 287–296.
[54]
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Learning and transferring multi-task deep representation for face alignment. abs/1408.3967 (2014).
[55]
S. Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S. Duncan, Bram Van Ginneken, Anant Madabhushi, Jerry L. Prince, Daniel Rueckert, and Ronald M. Summers. 2021. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109, 5 (2021), 820–838.

Cited By

View all
  • (2025)Automatic Classification and Recommendation for English Teaching Materials Based on Natural Language ProcessingJournal of Circuits, Systems and Computers10.1142/S021812662550149XOnline publication date: 24-Feb-2025
  • (2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
  • (2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 6
December 2023
493 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3632517
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2023
Online AM: 08 September 2023
Accepted: 13 August 2023
Revised: 14 July 2023
Received: 14 July 2022
Published in TIST Volume 14, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Privacy preservation
  2. medical images
  3. adversarial learning
  4. feature disentanglement
  5. adversarial examples

Qualifiers

  • Research-article

Funding Sources

  • National Science and Technology Council (NSTC) of Taiwan
  • Institute of Information Science (IIS), Academia Sinica, Taiwan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)20
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Automatic Classification and Recommendation for English Teaching Materials Based on Natural Language ProcessingJournal of Circuits, Systems and Computers10.1142/S021812662550149XOnline publication date: 24-Feb-2025
  • (2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
  • (2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
  • (2024)ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual FoundationsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671992(3527-3538)Online publication date: 25-Aug-2024
  • (2024)Pareto-based Multi-Objective Recommender System with Forgetting CurveProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680080(4603-4611)Online publication date: 21-Oct-2024
  • (2024)Sequential Recommendation with Collaborative Explanation via Mutual Information MaximizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657770(1062-1072)Online publication date: 10-Jul-2024
  • (2024)Understanding and Counteracting Feature-Level Bias in Click-Through Rate PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651576(838-841)Online publication date: 13-May-2024
  • (2023)Triple Dual Learning for Opinion-based Explainable RecommendationACM Transactions on Information Systems10.1145/363152142:3(1-27)Online publication date: 30-Dec-2023
  • (2023)MHANER: A Multi-source Heterogeneous Graph Attention Network for Explainable Recommendation in Online GamesACM Transactions on Intelligent Systems and Technology10.1145/3626243Online publication date: 9-Oct-2023
  • (2023)A Satellite Imagery Dataset for Long-Term Sustainable Development in United States CitiesScientific Data10.1038/s41597-023-02576-310:1Online publication date: 4-Dec-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media