Abstract
K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two con°icting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. ICDE, Istanbul, Turkey, 2007, pp.106–115.
Loukides G, Shao J. Speeding up clustering-based k-anonymisation algorithms with pre-partitioning. In Proc. The 24th British National Conference on Databases, Glasgow, UK, 2007, pp.203–214.
Loukides G, Shao J. Capturing data usefulness and privacy protection in K-anonymisation. In Proc. The 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, 2007, pp.370–374.
Sweeney L. K-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557–570.
Samarati P. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(9): 1010–1027.
LeFevre K, DeWitt D J, Ramakrishnan R. Mondrian multi-dimensional K-anonymity. In Proc. ICDE, Atlanta, Georgia, USA, 2006, p.25.
Bayardo R J, Agrawal R. Data privacy through optimal k-anonymization. In Proc. ICDE, Tokyo, Japan, 2005, pp.217–228.
Iyengar V S. Transforming data to satisfy privacy constraints. In Proc. KDD, Edmonton, Alberta, Canada, 2002, pp.279–288.
LeFevre K, DeWitt D J, Ramakrishnan R. Workload-aware anonymization. In Proc. KDD, Philadelphia, PA, USA, 2006, pp.277–286.
Fung B C M, Wang K, Yu P S. Top-down specialization for information and privacy preservation. In Proc. ICDE, Tokyo, Japan, 2005, pp.205–216.
Teng Z, Du W. Comparisons of k-anonymization and randomization schemes under linking attacks. In Proc. ICDM, Hong Kong, China, 2006, pp.1091–1096.
Machanavajjhala A Gehrke J, D Kifer et al. l-diversity: Privacy beyond k-anonymity. In Proc. ICDE, Atlanta, Georgia, USA, 2006, p.24.
Koudas N, Zhang Q, Srivastava D et al. Aggregate query answering on anonymized tables. In Proc. ICDE, Istanbul, Turkey, 2007, pp.116–125.
Hettich S, Merz C J. UCI Repository of machine learning databases, 1999, http://kdd.ics.uci.edu.
LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient full-domain K-anonymity. In Proc. SIGMOD, Baltimore, Maryland, USA, 2005, pp.49–60.
Xu J, Wang W, Pei J et al. Utility-based anonymization using local recoding. In Proc. KDD, Philadelphia, PA, USA, 2006, pp.785–790.
Aggarwal C C, Yu P S. A condensation approach to privacy preserving data mining. In Proc. The 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, 2004, pp.183–199.
Byun J, Kamra E, Bertino E et al. Efficient k-anonymization using clustering techniques. In Proc. The 12th International Conference on Database Systems for Advanced Applications, 2007, Bangkok, Thailand, pp.188–200.
Zhou J, Sander J. Data bubbles for non-vector data: Speeding-up hierarchical clustering in arbitrary metric spaces. In Proc. VLDB, Berlin, Germany, 2003, pp.452–463.
Narayan B L, Murthy C A, Pal S K. Maxdiff kd-trees for data condensation. Pattern Recogn. Lett., 27(3): 187–200.
Xiao X, Tao Y. Anatomy: Simple and effective privacy preservation. In Proc. VLDB, Seoul, Korea, 2006, pp.139–150.
Gehrke J, Ramakrishnan R, Ganti V. RainForest — A Framework for fast decision tree construction of large datasets. In Proc. VLDB, New York City, USA, 1998, pp.416–427.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Loukides, G., Shao, JH. An Efficient Clustering Algorithm for k-Anonymisation. J. Comput. Sci. Technol. 23, 188–202 (2008). https://doi.org/10.1007/s11390-008-9121-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-008-9121-3