Skip to main content
Log in

An Efficient Clustering Algorithm for k-Anonymisation

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two con°icting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. ICDE, Istanbul, Turkey, 2007, pp.106–115.

  2. Loukides G, Shao J. Speeding up clustering-based k-anonymisation algorithms with pre-partitioning. In Proc. The 24th British National Conference on Databases, Glasgow, UK, 2007, pp.203–214.

  3. Loukides G, Shao J. Capturing data usefulness and privacy protection in K-anonymisation. In Proc. The 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, 2007, pp.370–374.

  4. Sweeney L. K-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557–570.

    Article  MATH  MathSciNet  Google Scholar 

  5. Samarati P. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(9): 1010–1027.

    Article  Google Scholar 

  6. LeFevre K, DeWitt D J, Ramakrishnan R. Mondrian multi-dimensional K-anonymity. In Proc. ICDE, Atlanta, Georgia, USA, 2006, p.25.

  7. Bayardo R J, Agrawal R. Data privacy through optimal k-anonymization. In Proc. ICDE, Tokyo, Japan, 2005, pp.217–228.

  8. Iyengar V S. Transforming data to satisfy privacy constraints. In Proc. KDD, Edmonton, Alberta, Canada, 2002, pp.279–288.

  9. LeFevre K, DeWitt D J, Ramakrishnan R. Workload-aware anonymization. In Proc. KDD, Philadelphia, PA, USA, 2006, pp.277–286.

  10. Fung B C M, Wang K, Yu P S. Top-down specialization for information and privacy preservation. In Proc. ICDE, Tokyo, Japan, 2005, pp.205–216.

  11. Teng Z, Du W. Comparisons of k-anonymization and randomization schemes under linking attacks. In Proc. ICDM, Hong Kong, China, 2006, pp.1091–1096.

  12. Machanavajjhala A Gehrke J, D Kifer et al. l-diversity: Privacy beyond k-anonymity. In Proc. ICDE, Atlanta, Georgia, USA, 2006, p.24.

  13. Koudas N, Zhang Q, Srivastava D et al. Aggregate query answering on anonymized tables. In Proc. ICDE, Istanbul, Turkey, 2007, pp.116–125.

  14. Hettich S, Merz C J. UCI Repository of machine learning databases, 1999, http://kdd.ics.uci.edu.

  15. LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient full-domain K-anonymity. In Proc. SIGMOD, Baltimore, Maryland, USA, 2005, pp.49–60.

  16. Xu J, Wang W, Pei J et al. Utility-based anonymization using local recoding. In Proc. KDD, Philadelphia, PA, USA, 2006, pp.785–790.

  17. Aggarwal C C, Yu P S. A condensation approach to privacy preserving data mining. In Proc. The 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, 2004, pp.183–199.

  18. Byun J, Kamra E, Bertino E et al. Efficient k-anonymization using clustering techniques. In Proc. The 12th International Conference on Database Systems for Advanced Applications, 2007, Bangkok, Thailand, pp.188–200.

  19. Zhou J, Sander J. Data bubbles for non-vector data: Speeding-up hierarchical clustering in arbitrary metric spaces. In Proc. VLDB, Berlin, Germany, 2003, pp.452–463.

  20. Narayan B L, Murthy C A, Pal S K. Maxdiff kd-trees for data condensation. Pattern Recogn. Lett., 27(3): 187–200.

  21. Xiao X, Tao Y. Anatomy: Simple and effective privacy preservation. In Proc. VLDB, Seoul, Korea, 2006, pp.139–150.

  22. Gehrke J, Ramakrishnan R, Ganti V. RainForest — A Framework for fast decision tree construction of large datasets. In Proc. VLDB, New York City, USA, 1998, pp.416–427.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grigorios Loukides.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loukides, G., Shao, JH. An Efficient Clustering Algorithm for k-Anonymisation. J. Comput. Sci. Technol. 23, 188–202 (2008). https://doi.org/10.1007/s11390-008-9121-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9121-3

Keywords

Navigation