research-article

Learning Privacy-Preserving Embeddings for Image Data to Be Published

Authors:

Shou-De LinAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 6

Article No.: 105, Pages 1 - 26

https://doi.org/10.1145/3623404

Published: 14 November 2023 Publication History

Abstract

Deep learning shows superiority in learning feature representations that offer promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender, and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to releasing the embeddings rather than the original images. In this article, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keeping the performance of the main task, such as disease prediction) and to simultaneously prevent the private attributes of data instances from being accurately inferred. We also want the hard embeddings to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is twofold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-ray datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also shows the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.

References

[1]

Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security. 308–318.

Digital Library

[2]

Tanbir Ahmed, Md Momin Al Aziz, Noman Mohammed, and Xiaoqian Jiang. 2021. Privacy preserving neural networks for electronic health records de-identification. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB’21). Article 8, 6 pages.

Digital Library

[3]

Giuseppe Ateniese, Giovanni Felici, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, and Domenico Vitali. 2015. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. In International Journal of Security and Networks, Vol. 10. 137–150.

[4]

Sheikh Shams Azam, Taejin Kim, Seyyedali Hosseinalipour, Carlee Joe-Wong, Saurabh Bagchi, and Christopher Brinton. 2022. Can we generalize and distribute private representation learning?. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. 11320–11340.

[5]

Ghazaleh Beigi, Ahmadreza Mosallanezhad, Ruocheng Guo, Hamidreza Alvari, Alexander Nou, and Huan Liu. 2020. Privacy-aware recommendation with private-attribute protection using adversarial learning. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM’20). 34–42.

Digital Library

[6]

Hsin-Yu Chen and Cheng-Te Li. 2022. Predicting and analyzing privacy settings and categories for posts on social media. In 2022 IEEE International Conference on Big Data (Big Data). 5692–5697.

[7]

Jiawei Chen, Janusz Konrad, and Prakash Ishwar. 2018. VGAN-based image representation learning for privacy-preserving facial expression recognition. In Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]

Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 787–795.

Digital Library

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.

[10]

Xiaofeng Ding, Hongbiao Fang, Zhilin Zhang, Kim-Kwang Raymond Choo, and Hai Jin. 2022. Privacy-preserving feature extraction via adversarial training. IEEE Transactions on Knowledge and Data Engineering 34, 4 (2022), 1967–1979.

[11]

Cynthia Dwork. 2006. Differential privacy. In The 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006)(Lecture Notes in Computer Science, Vol. 4052). 1–12.

[12]

Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 1322–1333.

Digital Library

[13]

Yaroslav Ganin and Victor S. Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In the 32nd International Conference on Machine Learning.

[14]

Judy Wawira Gichoya, Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P. Lungren, Lyle J. Palmer, Brandon J. Price, Saptarshi Purkayastha, Ayis T. Pyrros, Lauren Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Trivedi, Ryan Wang, Zachary Zaiman, and Haoran Zhang. 2022. AI recognition of patient race in medical imaging: A modelling study. The Lancet Digital Health 4, 6 (Jun2022), e406–e414.

[15]

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[17]

Briland Hitaj, Giuseppe Ateniese, and Fernando Pérez-Cruz. 2017. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA). 603–618.

Digital Library

[18]

I-Chung Hsieh and Cheng-Te Li. 2023. NetFense: Adversarial defenses against privacy attacks on neural networks for graph data. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2023), 796–809.

[19]

Jia-Yun Jiang, Cheng-Te Li, and Shou-De Lin. 2019. Towards a more reliable privacy-preserving recommender system. Information Sciences 482 (2019), 248–265.

Digital Library

[20]

Georgios Kaissis, Alexander Ziller, Jonathan Passerat-Palmbach, Théo Ryffel, Dmitrii Usynin, Andrew Trask, Ionésio Lima, Jason Mancuso, Friederike Jungmann, Marc-Matthias Steinborn, Andreas Saleh, Marcus Makowski, Daniel Rueckert, and Rickmer Braren. 2021. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence 3 (062021), 1–12.

[21]

Bach Ngoc Kim, Jose Dolz, Pierre-Marc Jodoin, and Christian Desrosiers. 2021. Privacy-Net: An adversarial approach for identity-obfuscated segmentation of medical images. IEEE Transactions on Medical Imaging 40, 7 (2021), 1737–1749.

[22]

Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. SplitNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 1866–1874.

[23]

Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805.

[24]

Chih-Te Lai, Cheng-Te Li, and Shou-De Lin. 2020. Deep energy factorization model for demographic prediction. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 1 (2020), 1–16.

[25]

Ang Li, Jiayi Guo, Huanrui Yang, Flora D. Salim, and Yiran Chen. 2021. DeepObfuscator: Obfuscating intermediate representations with privacy-preserving adversarial learning on smartphones. In Proceedings of the International Conference on Internet-of-Things Design and Implementation (IoTDI’21). 28–39.

Digital Library

[26]

Tsung-Hsien Lin, Ying-Shuo Lee, Fu-Chieh Chang, J. Morris Chang, and Pei-Yuan Wu. 2023. Protecting sensitive attributes by adversarial training through class-overlapping techniques. IEEE Transactions on Information Forensics and Security 18 (2023), 1283–1294.

[27]

Yang Liu, Zhaowen Wang, Hailin Jin, and Ian Wassell. 2018. Multi-task adversarial network for disentangled feature learning. In The IEEE Conference on Computer Vision and Pattern Recognition.

[28]

Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, and Kai Chen. 2018. Understanding membership inferences on well-generalized learning models. abs/1802.04889 (2018).

[29]

Lingjuan Lyu, Jiangshan Yu, Karthik Nandakumar, Yitong Li, Xingjun Ma, Jiong Jin, Han Yu, and Kee Siong Ng. 2020. Towards fair and privacy-preserving federated deep models. IEEE Transactions on Parallel and Distributed Systems 31, 11 (2020), 2524–2541.

[30]

Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. 2021. Adversarial machine learning in image classification: A survey toward the defender’s perspective. ACM Comput. Surv. 55, 1, Article 8 (Nov2021).

[31]

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2018. Inference attacks against collaborative learning. abs/1805.04049 (2018).

[32]

Riccardo Miotto, Li Li, and Brian Kidd. 2016. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports 6 (052016), 26094.

[33]

National Institutes of Health (NIH). 2017. Random Sample of NIH Chest X-ray Dataset. Retrieved April 16, 2019 from https://www.kaggle.com/nih-chest-xrays/sample

[34]

Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco G. B. De Natale. 2017. Deep learning for mobile multimedia: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3s, Article 34 (Jun2017).

[35]

Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2017. Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the International Conference on Learning Representations.

[36]

Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable private learning with PATE. In Proceedings of the International Conference on Learning Representations.

[37]

NhatHai Phan, Yue Wang, Xintao Wu, and Dejing Dou. 2016. Differential privacy preservation for deep auto-encoders: An application of human behavior prediction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 1309–1316.

[38]

Francesco Pittaluga, Sanjeev J. Koppal, and Ayan Chakrabarti. 2019. Learning privacy preserving encodings through adversarial training. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (2019), 791–799.

[39]

Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, Yun Liu, Michael V. McConnell, Gregory S. Corrado, Lily H. Peng, and Dale R. Webster. 2018. Predicting cardiovascular risk factors from retinal fundus photographs using deep learning. In Nature Biomedical Engineering.

[40]

Proteek Chandan Roy and Vishnu Naresh Boddeti. 2019. Mitigating information leakage in image representations: A maximum entropy approach. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2581–2589.

[41]

August DuMont Schütte, Jürgen Hetzel, Sergios Gatidis, Tobias Hepp, Benedikt Dietz, Stefan Bauer, and Patrick Schwab. 2021. Overcoming barriers to data sharing with medical image generation: A comprehensive evaluation. NPJ Digital Medicine 4, 141 (2021).

[42]

Cristina Segalin, Fabio Celli, Luca Polonio, Michal Kosinski, David Stillwell, Nicu Sebe, Marco Cristani, and Bruno Lepri. 2017. What your Facebook profile picture reveals about your personality. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 460–468.

Digital Library

[43]

Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA). 1310–1321.

Digital Library

[44]

Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. In Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security. 587–601.

Digital Library

[45]

Congzheng Song and Vitaly Shmatikov. 2020. Overlearning reveals sensitive attributes. In International Conference on Learning Representations.

[46]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction APIs. In Proceedings of the 25th USENIX Conference on Security Symposium. 601–618.

Digital Library

[47]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 2017. ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2097–2106.

[48]

Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, and Ming-Hsuan Yang. 2020. Adversarial learning of privacy-preserving and task-oriented representations. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 12434–12441.

[49]

Chugui Xu, Ju Ren, Deyu Zhang, Yaoxue Zhang, Zhan Qin, and Kui Ren. 2019. GANobfuscator: Mitigating information leakage under GAN via differential privacy. IEEE Transactions on Information Forensics and Security 14, 9 (2019), 2358–2371.

[50]

Chenyu You, Linfeng Yang, Yi Zhang, and Ge Wang. 2019. Low-dose CT via deep CNN with skip connection and network-in-network. In Developments in X-Ray Tomography XII, Vol. 11113. SPIE, 429–434.

[51]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Benjamin, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In Proceedings of the International Conference on Learning Representations.

[52]

Likun Zhang, Yahong Chen, Ang Li, Binghui Wang, Yiran Chen, Fenghua Li, Jin Cao, and Ben Niu. 2023. Interpreting disparate privacy-utility tradeoff in adversarial learning via attribute correlation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 4701–4709.

[53]

Yang Zhang, Mathias Humbert, Tahleen Rahman, Cheng-Te Li, Jun Pang, and Michael Backes. 2018. Tagvisor: A privacy advisor for sharing hashtags. In Proceedings of the 2018 World Wide Web Conference (WWW’18). 287–296.

Digital Library

[54]

Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Learning and transferring multi-task deep representation for face alignment. abs/1408.3967 (2014).

[55]

S. Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S. Duncan, Bram Van Ginneken, Anant Madabhushi, Jerry L. Prince, Daniel Rueckert, and Ronald M. Summers. 2021. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109, 5 (2021), 820–838.

Cited By

Bai YXian H(2025)Automatic Classification and Recommendation for English Teaching Materials Based on Natural Language ProcessingJournal of Circuits, Systems and Computers10.1142/S021812662550149XOnline publication date: 24-Feb-2025
https://doi.org/10.1142/S021812662550149X
Deng MZhang WZhao JWang ZZhou MLuo JChen C(2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3652857
Hao MCai MFang MYou L(2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3643861
Show More Cited By

Index Terms

Learning Privacy-Preserving Embeddings for Image Data to Be Published

Recommendations

An effective value swapping method for privacy preserving data publishing

Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in data ...
A review of privacy preserving models for multi-party data release framework
WIR '16: Proceedings of the ACM Symposium on Women in Research 2016

Nowadays, with the improvement of internet technology and advancement in distributed computing data is increasing rapidly. There is a need of information sharing between organizations. Ideally, we wish to share data from multiple private databases and ...
Differential privacy in deep learning: A literature survey
Abstract
The widespread adoption of deep learning is facilitated in part by the availability of large-scale data for training desirable models. However, these data may involve sensitive personal information, which raises privacy concerns for data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 14, Issue 6

December 2023

493 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3632517

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2023

Online AM: 08 September 2023

Accepted: 13 August 2023

Revised: 14 July 2023

Received: 14 July 2022

Published in TIST Volume 14, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Council (NSTC) of Taiwan
Institute of Information Science (IIS), Academia Sinica, Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
343
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)20

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai YXian H(2025)Automatic Classification and Recommendation for English Teaching Materials Based on Natural Language ProcessingJournal of Circuits, Systems and Computers10.1142/S021812662550149XOnline publication date: 24-Feb-2025
https://doi.org/10.1142/S021812662550149X
Deng MZhang WZhao JWang ZZhou MLuo JChen C(2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3652857
Hao MCai MFang MYou L(2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3643861
Xiao CZhou JXiao YHuang JXiong HBaeza-Yates RBonchi F(2024)ReFound: Crafting a Foundation Model for Urban Region Understanding upon Language and Visual FoundationsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671992(3527-3538)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671992
Jin JZhang ZLi ZGao XYang XXiao LJiang JSerra ESpezzano F(2024)Pareto-based Multi-Objective Recommender System with Forgetting CurveProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680080(4603-4611)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680080
Yu YSugiyama KJatowt AHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Sequential Recommendation with Collaborative Explanation via Mutual Information MaximizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657770(1062-1072)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657770
Jin JDing SWang WFeng FChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Understanding and Counteracting Feature-Level Bias in Click-Through Rate PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651576(838-841)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651576
Zhang YSun YZhuang FZhu YAn ZXu Y(2023)Triple Dual Learning for Opinion-based Explainable RecommendationACM Transactions on Information Systems10.1145/363152142:3(1-27)Online publication date: 30-Dec-2023
https://dl.acm.org/doi/10.1145/3631521
Yu DWang XXiong YShen XWu RWang DZou ZXu G(2023)MHANER: A Multi-source Heterogeneous Graph Attention Network for Explainable Recommendation in Online GamesACM Transactions on Intelligent Systems and Technology10.1145/3626243Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3626243
Xi YLiu YLi TDing JZhang YTarkoma SLi YHui P(2023)A Satellite Imagery Dataset for Long-Term Sustainable Development in United States CitiesScientific Data10.1038/s41597-023-02576-310:1Online publication date: 4-Dec-2023
https://doi.org/10.1038/s41597-023-02576-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents