Using Classification Methods to Evaluate Attribute Disclosure Risk

Nin, Jordi; Herranz, Javier; Torra, Vicenç

doi:10.1007/978-3-642-16292-3_27

Using Classification Methods to Evaluate Attribute Disclosure Risk

Jordi Nin²²,
Javier Herranz²³ &
Vicenç Torra²⁴

Conference paper

597 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6408))

Abstract

Statistical Disclosure Control protection methods perturb the non-confidential attributes of an original dataset and publish the perturbed results along with the values of confidential attributes. Traditionally, such a method is considered to achieve a good privacy level if attackers who try to link an original record with its perturbed counterpart have a low success probability. Another opinion is lately gaining popularity: the protection methods should resist not only record re-identification attacks, but also attacks that try to guess the true value of some confidential attribute of some original record(s). This is known as attribute disclosure risk.

In this paper we propose a quite simple strategy to estimate the attribute disclosure risk suffered by a protection method: using a classifier, constructed from the protected (public) dataset, to predict the attribute values of some original record. After defining this approach in detail, we describe some experiments that show the power and danger of the approach: very popular protection methods suffer from very high attribute disclosure risk values.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 21(4), 515–556 (1989)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Article MathSciNet MATH Google Scholar
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)
Article MATH Google Scholar
Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308 (1986)
Google Scholar
Kim, J., Winkler, W.E.: Multiplicative noise for masking continuous data. Research report series (statistics 2003-01), U. S. Bureau of the Census (2003)
Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)
Article Google Scholar
Li, N., Li, T.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proc. of IEEE Int. Conf. on Data Engineering (2007)
Google Scholar
Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)
Article Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: IEEE Int. Conf. on Data Engineering (2006)
Google Scholar
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11(2), 181–193 (2005)
Article MathSciNet Google Scholar
U.S. Census Bureau. Data extraction system (2009), http://www.census.gov/
Murphy, P., Aha, D.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine (1994)
Google Scholar
Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data and Knowledge Engineering 64(1), 346–364 (2008)
Article Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal United Nations Economic Commission for Europe 18(4), 345–354 (2000)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI Intl. Tech. Rep. (1998)
Google Scholar
Torra, V., Nin, J.: Record linkage for database integration using fuzzy integrals. Int. Journal of Intelligent Systems (IJIS) 23(6), 715–734 (2008)
Article MATH Google Scholar
Truta, T.M., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: IEEE Int. Conf. on Data Engineering Workshops (2006)
Google Scholar
Vapnik, V.: The support vector method. In: Int. Conference on Artificial Neural Networks, pp. 263–271 (1997)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

LAAS, CNRS, 7 avenue du Colonel Roche, 31077, Toulouse Cedex 4, France
Jordi Nin
Dept. Matemàtica Aplicada IV, Universitat Politècnica de Catalunya, C. Jordi Girona 1-3, Mòdul C-3, 08034, Barcelona, Spain
Javier Herranz
CSIC, Spanish National Research Council, IIIA, Artificial Intelligence Research Institute, Campus UAB s/n, 08193, Bellaterra, Spain
Vicenç Torra

Authors

Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar
Javier Herranz
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IIIA-CSIC, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra
Toho Gakuen, 3-1-10 Naka, Kunitachi, 186-0004, Tokyo, Japan
Yasuo Narukawa
Université de Perpignan, Tecnosud, Rambla de la thermodynamique, 66100, Perpignan, France
Marc Daumas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nin, J., Herranz, J., Torra, V. (2010). Using Classification Methods to Evaluate Attribute Disclosure Risk. In: Torra, V., Narukawa, Y., Daumas, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2010. Lecture Notes in Computer Science(), vol 6408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16292-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-16292-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16291-6
Online ISBN: 978-3-642-16292-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics