Abstract
Machine learning with personal data has been successfully employed in recent years. Protecting the privacy of a given personal dataset with utilities is therefore a major subject within the privacy community. The classifier trained on an anonymized dataset typically has poorer generalization performance than the classifier trained on the original dataset. We adopt the concept of Tramer et al.’s Model Extraction attack to place models trained on anonymized datasets similar to the original model. This approach also prevents the target variable from being published as it is. Using three open datasets, we do experiments to determine how close the original model is to the model trained on a dataset constructed by our method. Particularly when using Nursery Data set, the model trained on our anonymized dataset with \(k = 20\)-anonymity gives a model that has almost the same (\(95\%\)) predictions on a given test dataset as the original model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anonymisation standard for publishing health and social care data specification (2013). https://digital.nhs.uk/
Guidelines for de-identification of personal data (2016). https://www.privacy.go.kr/
Privacy by design in big data (2016). https://op.europa.eu/
Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L., Thorpe, C.: Enhancing the utility of anonymized data by improving the quality of generalization hierarchies. Trans. Data Privacy 10(1), 27–59 (2017)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1322–1333. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2810103.2813677
Fung, B.C., Wang, K., Philip, S.Y.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)
Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: 21st International Conference on Data Engineering, ICDE 2005, pp. 205–216. IEEE (2005)
Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 429–440. IEEE (2009)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/775047.775089
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 3146–3154. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
Kisilevich, S., Rokach, L., Elovici, Y., Shapira, B.: Efficient multidimensional suppression for k-anonymity. IEEE Trans. Knowl. Data Eng. 22(3), 334–347 (2009)
Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2012, pp. 32–33. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2414456.2414474
Malle, B., Kieseberg, P., Holzinger, A.: Interactive anonymization for privacy aware machine learning. In: European Conference on Machine Learning and Knowledge Discovery ECML-PKDD, 18–22 September 2017, p. 15 (2017). http://ecmlpkdd2017.ijs.si/
Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_17
Rodríguez-Hoyos, A., Estrada-Jiménez, J., Rebollo-Monedero, D., Parra-Arnau, J., Forné, J.: Does \(k\)-anonymous microaggregation affect machine-learned macrotrends? IEEE Access 6, 28258–28277 (2018)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002). https://doi.org/10.1142/S021848850200165X
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002). https://doi.org/10.1142/S0218488502001648
Team ADP: Learning with privacy at scale. Apple Mach. Learn. J. 1(8) (2017)
Tramér, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: Proceedings of the 25th USENIX Conference on Security Symposium, SEC 2016, pp. 601–618. USENIX Association, USA (2016)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1150402.1150504
Yamaoka, Y., Itoh, K.: \(k\)-presence-secrecy: practical privacy model as extension of \(k\)-anonymity. IEICE Trans. Inf. Syst. E100.D(4), 730–740 (2017). https://doi.org/10.1587/transinf.2016DAP0015
Yeom, S., Giacomelli, I., Fredrikson, M., Jha, S.: Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282 (2018)
Zheng, A., Casari, A.: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1st edn. O’Reilly Media Inc., Newton (2018)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fukuoka, T., Yamaoka, Y., Terada, T. (2020). Model Extraction Oriented Data Publishing with k-anonymity. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-58208-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58207-4
Online ISBN: 978-3-030-58208-1
eBook Packages: Computer ScienceComputer Science (R0)