Skip to main content

Model Extraction Oriented Data Publishing with k-anonymity

  • Conference paper
  • First Online:
Advances in Information and Computer Security (IWSEC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12231))

Included in the following conference series:

Abstract

Machine learning with personal data has been successfully employed in recent years. Protecting the privacy of a given personal dataset with utilities is therefore a major subject within the privacy community. The classifier trained on an anonymized dataset typically has poorer generalization performance than the classifier trained on the original dataset. We adopt the concept of Tramer et al.’s Model Extraction attack to place models trained on anonymized datasets similar to the original model. This approach also prevents the target variable from being published as it is. Using three open datasets, we do experiments to determine how close the original model is to the model trained on a dataset constructed by our method. Particularly when using Nursery Data set, the model trained on our anonymized dataset with \(k = 20\)-anonymity gives a model that has almost the same (\(95\%\)) predictions on a given test dataset as the original model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anonymisation standard for publishing health and social care data specification (2013). https://digital.nhs.uk/

  2. Guidelines for de-identification of personal data (2016). https://www.privacy.go.kr/

  3. Privacy by design in big data (2016). https://op.europa.eu/

  4. Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., Thorpe, C.: Enhancing the utility of anonymized data by improving the quality of generalization hierarchies. Trans. Data Privacy 10(1), 27–59 (2017)

    Google Scholar 

  5. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  6. El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)

    Article  Google Scholar 

  7. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1322–1333. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2810103.2813677

  8. Fung, B.C., Wang, K., Philip, S.Y.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)

    Article  Google Scholar 

  9. Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: 21st International Conference on Data Engineering, ICDE 2005, pp. 205–216. IEEE (2005)

    Google Scholar 

  10. Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 429–440. IEEE (2009)

    Google Scholar 

  11. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/775047.775089

  12. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 3146–3154. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf

  13. Kisilevich, S., Rokach, L., Elovici, Y., Shapira, B.: Efficient multidimensional suppression for k-anonymity. IEEE Trans. Knowl. Data Eng. 22(3), 334–347 (2009)

    Article  Google Scholar 

  14. Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2012, pp. 32–33. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2414456.2414474

  15. Malle, B., Kieseberg, P., Holzinger, A.: Interactive anonymization for privacy aware machine learning. In: European Conference on Machine Learning and Knowledge Discovery ECML-PKDD, 18–22 September 2017, p. 15 (2017). http://ecmlpkdd2017.ijs.si/

  16. Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_17

    Chapter  Google Scholar 

  17. Rodríguez-Hoyos, A., Estrada-Jiménez, J., Rebollo-Monedero, D., Parra-Arnau, J., Forné, J.: Does \(k\)-anonymous microaggregation affect machine-learned macrotrends? IEEE Access 6, 28258–28277 (2018)

    Article  Google Scholar 

  18. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)

    Google Scholar 

  19. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002). https://doi.org/10.1142/S021848850200165X

  20. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002). https://doi.org/10.1142/S0218488502001648

  21. Team ADP: Learning with privacy at scale. Apple Mach. Learn. J. 1(8) (2017)

    Google Scholar 

  22. Tramér, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 601–618. USENIX Association, USA (2016)

    Google Scholar 

  23. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1150402.1150504

  24. Yamaoka, Y., Itoh, K.: \(k\)-presence-secrecy: practical privacy model as extension of \(k\)-anonymity. IEICE Trans. Inf. Syst. E100.D(4), 730–740 (2017). https://doi.org/10.1587/transinf.2016DAP0015

  25. Yeom, S., Giacomelli, I., Fredrikson, M., Jha, S.: Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282 (2018)

    Google Scholar 

  26. Zheng, A., Casari, A.: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1st edn. O’Reilly Media Inc., Newton (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Takeru Fukuoka , Yuji Yamaoka or Takeaki Terada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fukuoka, T., Yamaoka, Y., Terada, T. (2020). Model Extraction Oriented Data Publishing with k-anonymity. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58208-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58207-4

  • Online ISBN: 978-3-030-58208-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics