Article

Approximate algorithms for K-anonymity

Authors:

Hyoungmin Park,

Kyuseok ShimAuthors Info & Claims

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 67 - 78

https://doi.org/10.1145/1247480.1247490

Published: 11 June 2007 Publication History

Abstract

When a table containing individual data is published, disclosure of sensitive information should be prohibitive. A naive approach for the problem is to remove identifiers such as name and social security number. However, linking attacks which joins the published table with other tables on some attributes, called quasi-identifier, may reveal the sensitive information. To protect privacy against linking attack, the notion of k-anonymity which makes each record in the table be indistinguishable with k-1 other records has been proposed previously. It is shown to be NP-Hard to k-anonymize a table minimizing the number of suppressed cells. To alleviate this, O(k log k)-approximation and O(k)-approximation algorithms were proposed in previous works.

In this paper, we propose several approximation algorithms that guarantee O(log k)-approximation ratio and perform significantly better than the traditional algorithms. We also provide O(ß log k)-approximate algorithms which gracefully adjust their running time according to the tolerance é (≥ 1) of the approximation ratios. Experimental results confirm that our approximation algorithms perform significantly better than traditional approximation algorithms.

References

[1]

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pages 901--909, 2005.

Digital Library

[2]

G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu. Achieving anonymity via clustering. In VLDB, 2006.

Digital Library

[3]

G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, pages 246--258, 2005.

Digital Library

[4]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994.

Digital Library

[5]

R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005.

Digital Library

[6]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (Second Edition). McGraw Hill and MIT Press, 2001.

[7]

D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9:256--278, 1974.

Digital Library

[8]

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD Conference, pages 1--12, 2000.

Digital Library

[9]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD, pages 49--60, 2005.

Digital Library

[10]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006.

Digital Library

[11]

A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacy beyond k-anonymity. In ICDE, 2006.

Digital Library

[12]

A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proc. of PODS, 2004.

Digital Library

[13]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999.

Digital Library

[14]

U.C. Irvine Machine Learning Repository. http://www.ics.uci.edu/~mlearn/mlsummary.html.

[15]

P. Samarati and L. Sweeny. Generalizing data to provide anonymity when disclosing information (abstract). In In Proc. of ACM Symposium on Principles of Database Systems, page 188, 1998.

Digital Library

[16]

L. Sweeny. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowedge-based Systems, 10(5):557--570, 2002.

Digital Library

[17]

L. Willenborg and T. deWaal. Elements of statistical disclosure control. Springer Verlog Lecture Notes in Statistics, 2000.

Cited By

Shyamasundar RMaurya M(2024)Anonymization of Bigdata using ARX Tools2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638298(1-6)Online publication date: 13-Aug-2024
https://doi.org/10.1109/ICICS63486.2024.10638298
Javanmard AMirrokni VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Anonymous learning via look-alike clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667682(35966-35987)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667682
Alshuhail ABhatia S(2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
https://doi.org/10.3390/su15097447
Show More Cited By

Index Terms

Approximate algorithms for K-anonymity

Recommendations

(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the ...
Approximate algorithms with generalizing attribute values for k-anonymity

When a table containing individual data is published, disclosure of sensitive information should be prohibitive. Since simply removing identifiers such as name and social security number may reveal the sensitive information by linking attacks which ...
(α, k)-anonymous data publishing

Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

June 2007

1210 pages

ISBN:9781595936868

DOI:10.1145/1247480

General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS07

Sponsor:

SIGMOD/PODS07: International Conference on Management of Data

June 11 - 14, 2007

Beijing, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

105
Total Citations
View Citations
229
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)13

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shyamasundar RMaurya M(2024)Anonymization of Bigdata using ARX Tools2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638298(1-6)Online publication date: 13-Aug-2024
https://doi.org/10.1109/ICICS63486.2024.10638298
Javanmard AMirrokni VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Anonymous learning via look-alike clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667682(35966-35987)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667682
Alshuhail ABhatia S(2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
https://doi.org/10.3390/su15097447
de Jong Rvan der Loo MTakes F(2023)Algorithms for Efficiently Computing Structural Anonymity in Complex NetworksACM Journal of Experimental Algorithmics10.1145/360490828(1-22)Online publication date: 11-Aug-2023
https://dl.acm.org/doi/10.1145/3604908
Tsai YWang SHong T(2023)Privacy Preservation in Big Data AnalyticsGranular, Fuzzy, and Soft Computing10.1007/978-1-0716-2628-3_755(649-669)Online publication date: 30-Mar-2023
https://doi.org/10.1007/978-1-0716-2628-3_755
Kim J(2021)Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing EnvironmentApplied Sciences10.3390/app11221074011:22(10740)Online publication date: 14-Nov-2021
https://doi.org/10.3390/app112210740
Tsai YWang SHong T(2021)Privacy Preservation in Big Data AnalyticsEncyclopedia of Complexity and Systems Science10.1007/978-3-642-27737-5_755-1(1-22)Online publication date: 11-Nov-2021
https://doi.org/10.1007/978-3-642-27737-5_755-1
Mueller T(2020)Let’s Refresh! Efficient and Private OpenPGP Certificate Updates2020 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SoftCOM50211.2020.9238161(1-6)Online publication date: 17-Sep-2020
https://doi.org/10.23919/SoftCOM50211.2020.9238161
Liang XGuo YGuo Y(2020)A Global Optimal Model for Protecting PrivacyWireless Personal Communications10.1007/s11277-020-07110-xOnline publication date: 22-Jan-2020
https://doi.org/10.1007/s11277-020-07110-x
Ding XWang LShao ZJin H(2019)Efficient Recommendation of De-Identification Policies Using MapReduceIEEE Transactions on Big Data10.1109/TBDATA.2017.26906605:3(343-354)Online publication date: 1-Sep-2019
https://doi.org/10.1109/TBDATA.2017.2690660
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten