Abstract
Statistical Disclosure Control protection methods perturb the non-confidential attributes of an original dataset and publish the perturbed results along with the values of confidential attributes. Traditionally, such a method is considered to achieve a good privacy level if attackers who try to link an original record with its perturbed counterpart have a low success probability. Another opinion is lately gaining popularity: the protection methods should resist not only record re-identification attacks, but also attacks that try to guess the true value of some confidential attribute of some original record(s). This is known as attribute disclosure risk.
In this paper we propose a quite simple strategy to estimate the attribute disclosure risk suffered by a protection method: using a classifier, constructed from the protected (public) dataset, to predict the attribute values of some original record. After defining this approach in detail, we describe some experiments that show the power and danger of the approach: very popular protection methods suffer from very high attribute disclosure risk values.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 21(4), 515–556 (1989)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)
Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308 (1986)
Kim, J., Winkler, W.E.: Multiplicative noise for masking continuous data. Research report series (statistics 2003-01), U. S. Bureau of the Census (2003)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)
Li, N., Li, T.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proc. of IEEE Int. Conf. on Data Engineering (2007)
Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: IEEE Int. Conf. on Data Engineering (2006)
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11(2), 181–193 (2005)
U.S. Census Bureau. Data extraction system (2009), http://www.census.gov/
Murphy, P., Aha, D.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine (1994)
Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data and Knowledge Engineering 64(1), 346–364 (2008)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal United Nations Economic Commission for Europe 18(4), 345–354 (2000)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI Intl. Tech. Rep. (1998)
Torra, V., Nin, J.: Record linkage for database integration using fuzzy integrals. Int. Journal of Intelligent Systems (IJIS) 23(6), 715–734 (2008)
Truta, T.M., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: IEEE Int. Conf. on Data Engineering Workshops (2006)
Vapnik, V.: The support vector method. In: Int. Conference on Artificial Neural Networks, pp. 263–271 (1997)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nin, J., Herranz, J., Torra, V. (2010). Using Classification Methods to Evaluate Attribute Disclosure Risk. In: Torra, V., Narukawa, Y., Daumas, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2010. Lecture Notes in Computer Science(), vol 6408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16292-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-16292-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16291-6
Online ISBN: 978-3-642-16292-3
eBook Packages: Computer ScienceComputer Science (R0)