Abstract
We present an approach of limiting the confidence of inferring sensitive properties to protect against the threats caused by data mining abilities. The problem has dual goals: preserve the information for a wanted data analysis request and limit the usefulness of unwanted sensitive inferences that may be derived from the release of data. Sensitive inferences are specified by a set of “privacy templates". Each template specifies the sensitive property to be protected, the attributes identifying a group of individuals, and a maximum threshold for the confidence of inferring the sensitive property given the identifying attributes. We show that suppressing the domain values monotonically decreases the maximum confidence of such sensitive inferences. Hence, we propose a data transformation that minimally suppresses the domain values in the data to satisfy the set of privacy templates. The transformed data is free of sensitive inferences even in the presence of data mining algorithms. The prior k-anonymization k has been italicized consistently throughout this article. focuses on personal identities. This work focuses on the association between personal identities and sensitive properties.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Washington, DC, pp 207–216
Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 217–228
Clifton C (2000) Using sample size to limit exposure to data mining. J Comput Secur 8(4):281–307
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving data mining. SIGKDD Explorat 4(2):28–34
Cox LH (1980) Suppression methodology and statistical disclosure control. J Am Stat Assoc, Theory Method Sect 75:377–385
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 217–228
Farkas C, Jajodia S (2003) The inference problem: a survey. SIGKDD Explorat 4(2):6–11
Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE internaional conference on data engineering (ICDE′05), Tokyo, Japan, pp 205–216
Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD, Edmonton, Alberta, Canada, pp 279–288
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD, Seattle, WA, USA, pp 599–604
Kim J, Winkler W (1995) Masking microdata files. In: ASA proceedings of the section on survey research methods
Kloesgen W (1995) Knowledge discovery in databases and data privacy. In: Proceedings of the IEEE expert symposium on knowledge discovery in databases
Machanavajjhala A, Gehrke J, Kifer D (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE internaional conference on data engineering (ICDE′06), Atlanta, GA, USA
Newman DJ, Hettich, S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report SRI-CSL-98-04, SRI Computer Science Laboratory
Sweeney L (2006) Datafly: a system for providing anonymity in medical data. In: Proceedings of the 11th international conference on database security, pp 356–381
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowledge Data Eng 16(4):434–447
Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of the 5th IEEE international conference on data mining (ICDM′05), Houston, TX, USA, pp 466–473
Wang K, Yu PS, Chakraborty S (2004) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 4th IEEE international conference on data mining (ICDM′04), Brighton, UK, pp 249–256
Yip RW, Levitt KN (1999) Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 12th international working conference on database security XII, pp 253–266
Author information
Authors and Affiliations
Corresponding author
Additional information
Ke Wang received Ph.D. from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Dr. Wang’s research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real-life problems. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences.
Benjamin C. M. Fung received B.Sc. and M.Sc. degrees in computing science from Simon Fraser University. Received the postgraduate scholarship doctoral award from the Natural Sciences and Engineering Research Council of Canada (NSERC), Mr. Fung is currently a Ph.D. candidate at Simon Fraser. His recent research interests include privacy-preserving data mining, secure distributed computing, and text mining. Before pursuing his Ph.D., he worked in the R&D Department at Business Objects and designed reporting systems for various Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) systems, including BaaN, Siebel, and PeopleSoft. Mr. Fung has published in data engineering, data mining, and security conferences, journals, and books, including IEEE ICDE, IEEE ICDM, IEEE ISI, SDM, KAIS, and the Encyclopedia of Data Warehousing and Mining.
Philip S. Yu received B.S. degree in E.E. from National Taiwan University, M.S. and Ph.D. degrees in E.E. from Stanford University, and M.B.A. degree from New York University. He is with IBM T.J. Watson Research Center and currently manager of the Software Tools and Techniques group. Dr. Yu has published more than 450 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and the IEEE. He has received several IBM honors including two IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, two Research Division Awards and the 85th plateau of Invention Achievement Awards. He received a Research Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor.
Rights and permissions
About this article
Cite this article
Wang, K., Fung, B.C.M. & Yu, P.S. Handicapping attacker's confidence: an alternative to k-anonymization. Knowl Inf Syst 11, 345–368 (2007). https://doi.org/10.1007/s10115-006-0035-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0035-5