Skip to main content
Log in

Efficient systematic clustering method for k-anonymization

  • Original Article
  • Published:
Acta Informatica Aims and scope Submit manuscript

Abstract

This paper presents a clustering (Clustering partitions record into clusters such that records within a cluster are similar to each other, while records in different clusters are most distinct from one another.) based k-anonymization technique to minimize the information loss while at the same time assuring data quality. Privacy preservation of individuals has drawn considerable interests in data mining research. The k-anonymity model proposed by Samarati and Sweeney is a practical approach for data privacy preservation and has been studied extensively for the last few years. Anonymization methods via generalization or suppression are able to protect private information, but lose valued information. The challenge is how to minimize the information loss during the anonymization process. We refer to the challenge as a systematic clustering problem for k-anonymization which is analysed in this paper. The proposed technique adopts group-similar data together and then anonymizes each group individually. The structure of systematic clustering problem is defined and investigated through paradigm and properties. An algorithm of the proposed problem is developed and shown that the time complexity is in \({O(\frac{n^{2}}{k})}\), where n is the total number of records containing individuals concerning their privacy. Experimental results show that our method attains a reasonable dominance with respect to both information loss and execution time. Finally the algorithm illustrates the usability for incremental datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: International Conference on Data Engineering (2005)

  2. Byun J.W., Bertino E.: Micro-views, or on how to protect privacy while enhancing data usability: concepts and challenges. SIGMOD 35(1), 9–13 (2006)

    Article  Google Scholar 

  3. Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA) (2007)

  4. Byun, J.W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: 3rd VLDB Workshop on Secure Data Management (SDM) (2006)

  5. Chiu, C.-C., Tsai, C.-Y.: A k-anonymity clustering method for effective data privacy preservation. In: Third International Conference on Advanced Data Mining and Applications (ADMA) (2007)

  6. Ciriani V., di Vimercati S.D.C., Foresti S., Samarati P.: k-anonymous data mining: a aurvey. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining: Models and Algorithms, pp. 103–134. Kluwer Academic Publishers, Boston (2008)

    Google Scholar 

  7. Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: International Conference on Data Engineering (2005)

  8. Gonzalez T.Z.: Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38, 293–306 (1985)

    Article  MATH  Google Scholar 

  9. Hettich, C.B.S., Merz, C.: UCI repository of machine learning databases (1998)

  10. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD (2002)

  11. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incogniti: efficient full-domain k-anonymity. In: ACM International Conference on Management of Data (2005)

  12. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: International Conference on Data Engineering (2006)

  13. Li, N., Li, T.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE (2007)

  14. Lin, J.L., Wei, M.C.: An efficient clustering method for k-anonymization. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society (2008)

  15. Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing (2007)

  16. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramanian, M.: l-diversity: privacy beyond k-anonymity. In: ICDE (2006)

  17. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)

  18. Samarati, P.: Protecting respondent’s privacy in microdata release. TKDE, 13(6) (2001)

  19. Solanas, A., Sebe, F., Domingo-Ferrer, J.: Micro-aggregation-based heuristics for p-sensitive k-anonymity: One step beyond. In: International Work-shop on Privacy and Anonymity in the Information Society (2008)

  20. Sun, X., Li, M., Wang, H., Plank, A.: An efficient hash-based algorithm for minimal k-anonymity. In: ACSC, pp. 101–107, (2008)

  21. Sun, X., Wang, H., Li, J.: Priority driven K-Anonymisation for privacy protection. In: AusDM, pp. 73–78 (2008)

  22. Sweeney L.: Achieving k-anonymity privacy protection using generalization and supression. Int. J. Uncertainty Fuzziness Knowledge-based Syst. 10(5), 571–588 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  23. Sweeney L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge-based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  24. Truta, T., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: International Workshop on Privacy Data Management (PDM), p. 94 (2006)

  25. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recording. In: KDD 2006, pp. 785–790 (2006)

  26. Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisa Bertino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabir, M.E., Wang, H. & Bertino, E. Efficient systematic clustering method for k-anonymization. Acta Informatica 48, 51–66 (2011). https://doi.org/10.1007/s00236-010-0131-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00236-010-0131-6

Keywords

Navigation