Skip to main content
Log in

Handicapping attacker's confidence: an alternative to k-anonymization

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We present an approach of limiting the confidence of inferring sensitive properties to protect against the threats caused by data mining abilities. The problem has dual goals: preserve the information for a wanted data analysis request and limit the usefulness of unwanted sensitive inferences that may be derived from the release of data. Sensitive inferences are specified by a set of “privacy templates". Each template specifies the sensitive property to be protected, the attributes identifying a group of individuals, and a maximum threshold for the confidence of inferring the sensitive property given the identifying attributes. We show that suppressing the domain values monotonically decreases the maximum confidence of such sensitive inferences. Hence, we propose a data transformation that minimally suppresses the domain values in the data to satisfy the set of privacy templates. The transformed data is free of sensitive inferences even in the presence of data mining algorithms. The prior k-anonymization k has been italicized consistently throughout this article. focuses on personal identities. This work focuses on the association between personal identities and sensitive properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Washington, DC, pp 207–216

  2. Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 217–228

  3. Clifton C (2000) Using sample size to limit exposure to data mining. J Comput Secur 8(4):281–307

    Google Scholar 

  4. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving data mining. SIGKDD Explorat 4(2):28–34

    Article  Google Scholar 

  5. Cox LH (1980) Suppression methodology and statistical disclosure control. J Am Stat Assoc, Theory Method Sect 75:377–385

    Article  MATH  Google Scholar 

  6. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 217–228

  7. Farkas C, Jajodia S (2003) The inference problem: a survey. SIGKDD Explorat 4(2):6–11

    Article  Google Scholar 

  8. Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 205–216

  9. Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 279–288

  10. Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD, Seattle, WA, USA, pp 599–604

  11. Kim J, Winkler W (1995) Masking microdata files. In: ASA proceedings of the section on survey research methods

  12. Kloesgen W (1995) Knowledge discovery in databases and data privacy. In: Proceedings of the IEEE expert symposium on knowledge discovery in databases

  13. Machanavajjhala A, Gehrke J, Kifer D (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE internaional conference on data engineering (ICDE′06), Atlanta, GA, USA

  14. Newman DJ, Hettich, S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  15. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann

  16. Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report SRI-CSL-98-04, SRI Computer Science Laboratory

  17. Sweeney L (2006) Datafly: a system for providing anonymity in medical data. In: Proceedings of the 11th international conference on database security, pp 356–381

  18. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  19. Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowledge Data Eng 16(4):434–447

    Article  Google Scholar 

  20. Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of the 5th IEEE international conference on data mining (ICDM′05), Houston, TX, USA, pp 466–473

  21. Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 4th IEEE international conference on data mining (ICDM′04), Brighton, UK, pp 249–256

  22. Yip RW, Levitt KN (1999) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 12th international working conference on database security XII, pp 253–266

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Wang.

Additional information

Ke Wang received Ph.D. from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Dr. Wang’s research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real-life problems. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences.

Benjamin C. M. Fung received B.Sc. and M.Sc. degrees in computing science from Simon Fraser University. Received the postgraduate scholarship doctoral award from the Natural Sciences and Engineering Research Council of Canada (NSERC), Mr. Fung is currently a Ph.D. candidate at Simon Fraser. His recent research interests include privacy-preserving data mining, secure distributed computing, and text mining. Before pursuing his Ph.D., he worked in the R&D Department at Business Objects and designed reporting systems for various Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) systems, including BaaN, Siebel, and PeopleSoft. Mr. Fung has published in data engineering, data mining, and security conferences, journals, and books, including IEEE ICDE, IEEE ICDM, IEEE ISI, SDM, KAIS, and the Encyclopedia of Data Warehousing and Mining.

Philip S. Yu received B.S. degree in E.E. from National Taiwan University, M.S. and Ph.D. degrees in E.E. from Stanford University, and M.B.A. degree from New York University. He is with IBM T.J. Watson Research Center and currently manager of the Software Tools and Techniques group. Dr. Yu has published more than 450 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and the IEEE. He has received several IBM honors including two IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, two Research Division Awards and the 85th plateau of Invention Achievement Awards. He received a Research Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Fung, B.C.M. & Yu, P.S. Handicapping attacker's confidence: an alternative to k-anonymization. Knowl Inf Syst 11, 345–368 (2007). https://doi.org/10.1007/s10115-006-0035-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0035-5

Keywords

Navigation