Abstract
In this paper we present extended definitions of k-anonymity and use them to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.
Similar content being viewed by others
References
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB, pp. 901–909 (2005)
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: EDBT, pp. 183–199 (2004)
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Privacy Technol. (JOPT) (2005)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD’00, pp. 439–450. Dallas, Texas, (2000)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM, pp. 561–564 (2005)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: PKDD, pp. 10–21 (2005)
Bayardo, R.J.: Jr. Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)
Bertino, E., Chin Ooi, B., Yang, Y., Deng, R.H.: Privacy and ownership preserving of outsourced medical data. In: ICDE, pp. 521–532 (2005)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of PODS’05, pp. 128–138. ACM Press, New York (2005)
Electronic Privacy Information Center: Total "terrorism" information awareness (TIA). http://www.epic.org/privacy/ profiling/tia/.
Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward privacy in public databases. In: Theory of Cryptography Conference, pp. 363–385 (2005)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of PODS’03, pp. 202–210 (2003)
Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of CRPITS’14, pp. 1–8. Australian Computer Society, Inc, Darlinghurst (2002)
Dwork, C., Nissim, K.: Privacy-preserving data mining on vertically partitioned databases. In: Proceedings of CRYPTO’04 (2004)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of PODS’03, pp. 211–222. San Diego, California, USA, 9–12 June 2003
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of ACM SIGKDD’02, pp. 217–228. Canada (2002)
Fung, B.C.M., Wang, K., Yu, P.S.: Top–down specialization for information and privacy preservation. In: Proceedings of ICDE’05, Tokyo (2005)
Gilburd, B., Schuster, A., Wolff, R.: k-ttp: a new privacy model for large-scale distributed environments. In: Proceedings of ACM SIGKDD’04, pp. 563–568 (2004)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proc. of ACM SIGMOD’05 (2005)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of ACM SIGKDD’02, pp. 279–288 (2002)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of DKMD’02, (2002)
Kantarcioğlu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of ACM SIGKDD‘04, pp. 599–604. ACM Press, New York (2004)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceddings of ICDM’03, pp. 99. IEEE Computer Society, Washington (2003)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of , pp. 49–60. ACM Press, New York (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of ICDE (2006)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Proceedings of CRYPTO’04, pp. 36–54. Springer (2000)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)
Meyerson, A., Williams, R.: General k-anonymization is hard. In: Proceedings of PODS’04 (2003)
Ross Quinlan J. (1986). Induction of decision trees. Mach. Learn. 1(1): 81–106
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans Know Data Eng. 13(6), (2001), 1041–4347. DOI 10.1109/69.971193
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Technical Report SRI-CSL-98-04. CS Laboratory, SRI International (1998)
Sweeney L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 571–588
Sweeney L. (2002). k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5): 557–570
Zahn C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C- 20: 68–86
US Dept. of HHS: Standards for privacy of individually identifiable health information; final rule (2002)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of ACM SIGKDD ‘02, Edmonton, (2002)
Verykios V.S., Bertino E., Nai Fovino I., Provenza L.P., Saygin Y. and Theodoridis Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1): 50–57
Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM, pp. 466–473 (2005)
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: ICDM, pp. 249–256 (2004)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd ed. Morgan Kaufmann, San Francisco, (2005)
Zhong, S., Yang, Z., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SIAM International Conference on Data Mining (SDM), Newport Beach (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Friedman, A., Wolff, R. & Schuster, A. Providing k-anonymity in data mining. The VLDB Journal 17, 789–804 (2008). https://doi.org/10.1007/s00778-006-0039-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0039-5