Handicapping attacker's confidence: an alternative to k-anonymization

Wang, Ke; Fung, Benjamin C. M.; Yu, Philip S.

doi:10.1007/s10115-006-0035-5

Handicapping attacker's confidence: an alternative to k-anonymization

Regular Paper
Published: 03 October 2006

Volume 11, pages 345–368, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ke Wang¹,
Benjamin C. M. Fung¹ &
Philip S. Yu²

324 Accesses
81 Citations
Explore all metrics

Abstract

We present an approach of limiting the confidence of inferring sensitive properties to protect against the threats caused by data mining abilities. The problem has dual goals: preserve the information for a wanted data analysis request and limit the usefulness of unwanted sensitive inferences that may be derived from the release of data. Sensitive inferences are specified by a set of “privacy templates". Each template specifies the sensitive property to be protected, the attributes identifying a group of individuals, and a maximum threshold for the confidence of inferring the sensitive property given the identifying attributes. We show that suppressing the domain values monotonically decreases the maximum confidence of such sensitive inferences. Hence, we propose a data transformation that minimally suppresses the domain values in the data to satisfy the set of privacy templates. The transformed data is free of sensitive inferences even in the presence of data mining algorithms. The prior k-anonymization k has been italicized consistently throughout this article. focuses on personal identities. This work focuses on the association between personal identities and sensitive properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Privacy and artificial intelligence: challenges for protecting health information in a new era

Article Open access 15 September 2021

Blake Murdoch

Big healthcare data: preserving security and privacy

Article Open access 09 January 2018

Karim Abouelmehdi, Abderrahim Beni-Hessane & Hayat Khaloufi

Uncertainty in big data analytics: survey, opportunities, and challenges

Article Open access 04 June 2019

Reihaneh H. Hariri, Erik M. Fredericks & Kate M. Bowers

References

Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Washington, DC, pp 207–216
Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 217–228
Clifton C (2000) Using sample size to limit exposure to data mining. J Comput Secur 8(4):281–307
Google Scholar
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving data mining. SIGKDD Explorat 4(2):28–34
Article Google Scholar
Cox LH (1980) Suppression methodology and statistical disclosure control. J Am Stat Assoc, Theory Method Sect 75:377–385
Article MATH Google Scholar
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 217–228
Farkas C, Jajodia S (2003) The inference problem: a survey. SIGKDD Explorat 4(2):6–11
Article Google Scholar
Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 205–216
Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 279–288
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD, Seattle, WA, USA, pp 599–604
Kim J, Winkler W (1995) Masking microdata files. In: ASA proceedings of the section on survey research methods
Kloesgen W (1995) Knowledge discovery in databases and data privacy. In: Proceedings of the IEEE expert symposium on knowledge discovery in databases
Machanavajjhala A, Gehrke J, Kifer D (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE internaional conference on data engineering (ICDE′06), Atlanta, GA, USA
Newman DJ, Hettich, S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report SRI-CSL-98-04, SRI Computer Science Laboratory
Sweeney L (2006) Datafly: a system for providing anonymity in medical data. In: Proceedings of the 11th international conference on database security, pp 356–381
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowledge Data Eng 16(4):434–447
Article Google Scholar
Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of the 5th IEEE international conference on data mining (ICDM′05), Houston, TX, USA, pp 466–473
Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 4th IEEE international conference on data mining (ICDM′04), Brighton, UK, pp 249–256
Yip RW, Levitt KN (1999) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 12th international working conference on database security XII, pp 253–266

Download references

Author information

Authors and Affiliations

School of Computer Science, Simon Fraser University, Simon, BC, Canada, V5A 1S6
Ke Wang & Benjamin C. M. Fung
IBM T. J. Watson Research Center, Hawthorne, NY, 10532, USA
Philip S. Yu

Authors

Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin C. M. Fung
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Wang.

Additional information

Ke Wang received Ph.D. from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Dr. Wang’s research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real-life problems. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences.

Benjamin C. M. Fung received B.Sc. and M.Sc. degrees in computing science from Simon Fraser University. Received the postgraduate scholarship doctoral award from the Natural Sciences and Engineering Research Council of Canada (NSERC), Mr. Fung is currently a Ph.D. candidate at Simon Fraser. His recent research interests include privacy-preserving data mining, secure distributed computing, and text mining. Before pursuing his Ph.D., he worked in the R&D Department at Business Objects and designed reporting systems for various Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) systems, including BaaN, Siebel, and PeopleSoft. Mr. Fung has published in data engineering, data mining, and security conferences, journals, and books, including IEEE ICDE, IEEE ICDM, IEEE ISI, SDM, KAIS, and the Encyclopedia of Data Warehousing and Mining.

Philip S. Yu received B.S. degree in E.E. from National Taiwan University, M.S. and Ph.D. degrees in E.E. from Stanford University, and M.B.A. degree from New York University. He is with IBM T.J. Watson Research Center and currently manager of the Software Tools and Techniques group. Dr. Yu has published more than 450 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and the IEEE. He has received several IBM honors including two IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, two Research Division Awards and the 85th plateau of Invention Achievement Awards. He received a Research Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Fung, B.C.M. & Yu, P.S. Handicapping attacker's confidence: an alternative to k-anonymization. Knowl Inf Syst 11, 345–368 (2007). https://doi.org/10.1007/s10115-006-0035-5

Download citation

Received: 30 November 2005
Revised: 20 December 2005
Accepted: 20 February 2006
Published: 03 October 2006
Issue Date: April 2007
DOI: https://doi.org/10.1007/s10115-006-0035-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Handicapping attacker's confidence: an alternative to k-anonymization

Abstract

Access this article

Similar content being viewed by others

Privacy and artificial intelligence: challenges for protecting health information in a new era

Big healthcare data: preserving security and privacy

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Handicapping attacker's confidence: an alternative to k-anonymization

Abstract

Access this article

Similar content being viewed by others

Privacy and artificial intelligence: challenges for protecting health information in a new era

Big healthcare data: preserving security and privacy

Uncertainty in big data analytics: survey, opportunities, and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation