Abstract
Data auditing is a process to verify whether certain data have been removed from a trained model. A recently proposed method [10] uses Kolmogorov-Smirnov (KS) distance for such data auditing. However, it fails under certain practical conditions. In this paper, we propose a new method called Ensembled Membership Auditing (\(\mathsf {EMA}\)) for auditing data removal to overcome these limitations. We compare both methods using benchmark datasets (MNIST and SVHN) and Chest X-ray datasets with multi-layer perceptrons (MLP) and convolutional neural networks (CNN). Our experiments show that \(\mathsf {EMA}\) is robust under various conditions, including the failure cases of the previously proposed method. Our code is available at: https://github.com/Hazelsuko07/EMA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The real medical data experiment follows the similar setting.
References
Act, A.: Health insurance portability and accountability act of 1996. Public Law 104, 191 (1996)
Bourtoule, L., et al.: Machine unlearning. arXiv preprint arXiv:1912.03817 (2019)
Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D.: The secret sharer: evaluating and testing unintended memorization in neural networks. In: 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, pp. 267–284. USENIX Association, August 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/carlini
Guo, C., Goldstein, T., Hannun, A., Maaten, L.v.d.: Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kermany, D., Zhang, K., Goldbaum, M., et al.: Labeled optical coherence tomography (OCT) and chest X-ray images for classification. Mendeley Data 2(2) (2018)
Kermany, D.S., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Liu, X., Tsaftaris, S.A.: Have you forgotten? A method to assess if machine learning models have forgotten data. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 95–105. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_10
Nasr, M., Shokri, R., Houmansadr, A.: Machine learning with membership privacy using adversarial regularization. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 634–646 (2018)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M., Backes, M.: ML-leaks: model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Song, L., Mittal, P.: Systematic evaluation of privacy risks of machine learning models. arXiv preprint arXiv:2003.10595 (2020)
Voigt, P., Von dem Bussche, A.: The EU general data protection regulation (GDPR). Intersoft consulting (2018)
Wang, L., Lin, Z.Q., Wong, A.: COVID-net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10(1), 1–12 (2020)
Zhang, Y., Jia, R., Pei, H., Wang, W., Li, B., Song, D.: The secret revealer: generative model-inversion attacks against deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 253–261 (2020)
Acknowledgement
This project is supported in part by Princeton University fellowship and Amazon Web Services (AWS) Machine Learning Research Awards. The authors would like to thank Liwei Song and Dr. Quanzheng Li for helpful discussions.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Y., Li, X., Li, K. (2021). \(\mathsf {EMA}\): Auditing Data Removal from Trained Models. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12905. Springer, Cham. https://doi.org/10.1007/978-3-030-87240-3_76
Download citation
DOI: https://doi.org/10.1007/978-3-030-87240-3_76
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87239-7
Online ISBN: 978-3-030-87240-3
eBook Packages: Computer ScienceComputer Science (R0)