Model Extraction Oriented Data Publishing with k-anonymity

Fukuoka, Takeru; Yamaoka, Yuji; Terada, Takeaki

doi:10.1007/978-3-030-58208-1_13

Takeru Fukuoka¹⁰,
Yuji Yamaoka¹⁰ &
Takeaki Terada¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12231))

Included in the following conference series:

International Workshop on Security

524 Accesses

Abstract

Machine learning with personal data has been successfully employed in recent years. Protecting the privacy of a given personal dataset with utilities is therefore a major subject within the privacy community. The classifier trained on an anonymized dataset typically has poorer generalization performance than the classifier trained on the original dataset. We adopt the concept of Tramer et al.’s Model Extraction attack to place models trained on anonymized datasets similar to the original model. This approach also prevents the target variable from being published as it is. Using three open datasets, we do experiments to determine how close the original model is to the model trained on a dataset constructed by our method. Particularly when using Nursery Data set, the model trained on our anonymized dataset with $k = 20$-anonymity gives a model that has almost the same ($95\%$) predictions on a given test dataset as the original model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Anonymizing Machine Learning Models

A Two-Levels Data Anonymization Approach

Privacy-Preserving Big Data Publication: (K, L) Anonymity

References

Anonymisation standard for publishing health and social care data specification (2013). https://digital.nhs.uk/
Guidelines for de-identification of personal data (2016). https://www.privacy.go.kr/
Privacy by design in big data (2016). https://op.europa.eu/
Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., Thorpe, C.: Enhancing the utility of anonymized data by improving the quality of generalization hierarchies. Trans. Data Privacy 10(1), 27–59 (2017)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)
Article Google Scholar
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1322–1333. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2810103.2813677
Fung, B.C., Wang, K., Philip, S.Y.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)
Article Google Scholar
Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: 21st International Conference on Data Engineering, ICDE 2005, pp. 205–216. IEEE (2005)
Google Scholar
Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 429–440. IEEE (2009)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/775047.775089
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 3146–3154. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
Kisilevich, S., Rokach, L., Elovici, Y., Shapira, B.: Efficient multidimensional suppression for k-anonymity. IEEE Trans. Knowl. Data Eng. 22(3), 334–347 (2009)
Article Google Scholar
Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2012, pp. 32–33. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2414456.2414474
Malle, B., Kieseberg, P., Holzinger, A.: Interactive anonymization for privacy aware machine learning. In: European Conference on Machine Learning and Knowledge Discovery ECML-PKDD, 18–22 September 2017, p. 15 (2017). http://ecmlpkdd2017.ijs.si/
Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_17
Chapter Google Scholar
Rodríguez-Hoyos, A., Estrada-Jiménez, J., Rebollo-Monedero, D., Parra-Arnau, J., Forné, J.: Does $k$-anonymous microaggregation affect machine-learned macrotrends? IEEE Access 6, 28258–28277 (2018)
Article Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002). https://doi.org/10.1142/S021848850200165X
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002). https://doi.org/10.1142/S0218488502001648
Team ADP: Learning with privacy at scale. Apple Mach. Learn. J. 1(8) (2017)
Google Scholar
Tramér, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 601–618. USENIX Association, USA (2016)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1150402.1150504
Yamaoka, Y., Itoh, K.: $k$-presence-secrecy: practical privacy model as extension of $k$-anonymity. IEICE Trans. Inf. Syst. E100.D(4), 730–740 (2017). https://doi.org/10.1587/transinf.2016DAP0015
Yeom, S., Giacomelli, I., Fredrikson, M., Jha, S.: Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282 (2018)
Google Scholar
Zheng, A., Casari, A.: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1st edn. O’Reilly Media Inc., Newton (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Laboratories Ltd., Kawasaki, Japan
Takeru Fukuoka, Yuji Yamaoka & Takeaki Terada

Authors

Takeru Fukuoka
View author publications
You can also search for this author in PubMed Google Scholar
Yuji Yamaoka
View author publications
You can also search for this author in PubMed Google Scholar
Takeaki Terada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Takeru Fukuoka , Yuji Yamaoka or Takeaki Terada .

Editor information

Editors and Affiliations

Bunkyo University, Chigasaki, Japan
Kazumaro Aoki
Toho University, Funabashi, Japan
Akira Kanaoka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukuoka, T., Yamaoka, Y., Terada, T. (2020). Model Extraction Oriented Data Publishing with k-anonymity. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-58208-1_13
Published: 26 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58207-4
Online ISBN: 978-3-030-58208-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics