research-article

k-Anonymization by Freeform Generalization

Authors:

Dimitrios Tsoumakos,

Panagiotis KarrasAuthors Info & Claims

ASIA CCS '15: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security

Pages 519 - 530

https://doi.org/10.1145/2714576.2714590

Published: 14 April 2015 Publication History

Abstract

Syntactic data anonymization strives to (i) ensure that an adversary cannot identify an individual's record from published attributes with high probability, and (ii) provide high data utility. These mutually conflicting goals can be expressed as an optimization problem with privacy as the constraint and utility as the objective function. Conventional research using the k-anonymity model has resorted to publishing data in homogeneous generalized groups. A recently proposed alternative does not create such cliques; instead, it recasts data values in a heterogeneous manner, aiming for higher utility. Nevertheless, such works never defined the problem in the most general terms; thus, the utility gains they achieve are limited. In this paper, we propose a methodology that achieves the full potential of heterogeneity and gains higher utility while providing the same privacy guarantee. We formulate the problem of maximal-utility k-anonymization by freeform generalization as a network flow problem. We develop an optimal solution therefor using Mixed Integer Programming. Given the non-scalability of this solution, we develop an O(k n²) Greedy algorithm that has no time-complexity disadvantage vis-á-vis previous approaches, an O(k n² log n) enhanced version thereof, and an O(k n³) adaptation of the Hungarian algorithm; these algorithms build a set of k perfect matchings from original to anonymized data, a novel approach to the problem. Moreover, our techniques can resist adversaries who may know the employed algorithms. Our experiments with real-world data verify that our schemes achieve near-optimal utility (with gains of up to 41%), while they can exploit parallelism and data partitioning, gaining an efficiency advantage over simpler methods.

References

[1]

http://www.ipums.org.

[2]

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, 2005.

Digital Library

[3]

R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.

Digital Library

[4]

R. E. Bixby, M. Fenelon, Z. Gu, E. Rothberg, and R. Wunderling. Mixed-integer programming: A progress report. In M. Grötschel, editor, The Sharpest Cut, chapter 18, pages 309--325. 2004.

[5]

J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In KDD, 2008.

Digital Library

[6]

J. Cao and P. Karras. Publishing microdata with a robust privacy guarantee. PVLDB, 5(11):1388--1399, 2012.

Digital Library

[7]

J. Cao, P. Karras, P. Kalnis, and K.-L. Tan. SABRE: a Sensitive Attribute Bucketization and REdistribution framework for $t$-closeness. The VLDB Journal, 20(1):59--81, 2011.

Digital Library

[8]

R. Chaytor and K. Wang. Small domain randomization: Same privacy, more utility. PVLDB, 3(1):608--618, 2010.

Digital Library

[9]

K. Choromanski, T. Jebara, and K. Tang. Adaptive anonymity via phb-matching. In NIPS, pages 3192--3200, 2013.

[10]

C. Clifton and T. Tassa. On syntactic anonymity and differential privacy. In PrivDB, 2013.

[11]

G. Cormode, N. Li, T. Li, and D. Srivastava. Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. PVLDB, 3(1):1045--1056, 2010.

Digital Library

[12]

C. Dwork. Differential privacy. In ICALP (2), 2006.

Digital Library

[13]

J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. J. of the ACM, 19(2):248--264, 1972.

Digital Library

[14]

G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. Fast data anonymization with low information loss. In VLDB, 2007.

Digital Library

[15]

G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. A framework for efficient data anonymization under privacy and accuracy constraints. ACM TODS, 34(2):1--47, 2009.

Digital Library

[16]

A. Gionis, A. Mazza, and T. Tassa. k-anonymization revisited. In ICDE, 2008.

Digital Library

[17]

A. Korolova. Privacy violations using microtargeted ads: A case study. In ICDM Workshops, 2010.

Digital Library

[18]

H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1--2):83--97, 1955.

[19]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM TODS, 33(3):17:1--17:47, 2008.

Digital Library

[20]

N. Li, T. Li, and S. Venkatasubramanian. Closeness: A new privacy measure for data publishing. IEEE TKDE, 22(7):943--956, 2010.

Digital Library

[21]

N. Li, W. H. Qardaji, and D. Su. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In ASIACCS, 2012.

Digital Library

[22]

T. Li and N. Li. On the tradeoff between privacy and utility in data publishing. In KDD, 2009.

Digital Library

[23]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. $\ell$-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3, 2007.

Digital Library

[24]

P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 13(6):1010--1027, 2001.

Digital Library

[25]

T. Tassa, A. Mazza, and A. Gionis. k-concealment: An alternative model of k-type anonymity. Transactions on Data Privacy, 5(1):189--222, 2012.

Digital Library

[26]

N. Tomizawa. On some techniques useful for solution of transportation network problems. Networks, 1:173--194, 1971.

[27]

R. Wacks. Privacy. A very short introduction, volume 221 of Very short introductions. Oxford University Press, 2010.

[28]

W. K. Wong, N. Mamoulis, and D. W. L. Cheung. Non-homogeneous generalization in privacy preserving data publishing. In SIGMOD, 2010.

Digital Library

[29]

M. Xue, P. Karras, C. Raíssi, J. Vaidya, and K.-L. Tan. Anonymizing set-valued data by nonreciprocal recoding. In KDD, 2012.

Digital Library

Cited By

Arbelaez ACliment L(2024)Iterative local search for preserving data privacyApplied Intelligence10.1007/s10489-024-05909-w55:3Online publication date: 20-Dec-2024
https://doi.org/10.1007/s10489-024-05909-w
Hermanns JSkitsas KTsitsulin AMunkhoeva MKyster ANielsen SBronstein AMottin DKarras P(2023)GRASP: Scalable Graph Alignment by Spectral Corresponding FunctionsACM Transactions on Knowledge Discovery from Data10.1145/356105817:4(1-26)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3561058
Ren WGhazinour KLian X(2023)$kt$-Safety: Graph Release via $k$-Anonymity and $t$-ClosenessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322133335:9(9102-9113)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2022.3221333
Show More Cited By

Index Terms

k-Anonymization by Freeform Generalization
1. Security and privacy
  1. Database and storage security
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Theory of database privacy and security

Recommendations

An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining

Privacy has become an important concern while publishing micro data about a population. The emerging area called privacy preserving data mining (PPDM) focus on individual privacy without compromising data mining results. An adversarial exploitation of ...
Information based data anonymization for classification utility

Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building ...
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society

Privacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASIA CCS '15: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security

April 2015

698 pages

ISBN:9781450332453

DOI:10.1145/2714576

General Chairs:
Feng Bao
Huawei Technologies Pte Ltd, Singapore
,
Steven Miller
Singapore Management University, Singapore
,
Program Chairs:
Jianying Zhou
Institute for Infocomm Research, Singapore
,
Gail-Joon Ahn
Arizona State University, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASIA CCS '15

Sponsor:

SIGSAC

ASIA CCS '15: 10th ACM Symposium on Information, Computer and Communications Security

April 14 - March 17, 2015

Singapore, Republic of Singapore

Acceptance Rates

ASIA CCS '15 Paper Acceptance Rate 48 of 269 submissions, 18%;

Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
256
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Arbelaez ACliment L(2024)Iterative local search for preserving data privacyApplied Intelligence10.1007/s10489-024-05909-w55:3Online publication date: 20-Dec-2024
https://doi.org/10.1007/s10489-024-05909-w
Hermanns JSkitsas KTsitsulin AMunkhoeva MKyster ANielsen SBronstein AMottin DKarras P(2023)GRASP: Scalable Graph Alignment by Spectral Corresponding FunctionsACM Transactions on Knowledge Discovery from Data10.1145/356105817:4(1-26)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3561058
Ren WGhazinour KLian X(2023)$kt$-Safety: Graph Release via $k$-Anonymity and $t$-ClosenessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322133335:9(9102-9113)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2022.3221333
Gunawan DPriyawati DNugroho YIrsyadi FAndreansyah IIslam S(2023)Preserving Individual Privacy from Inference Attack in Transaction Data Publishing2023 Eighth International Conference on Informatics and Computing (ICIC)10.1109/ICIC60109.2023.10381942(1-6)Online publication date: 8-Dec-2023
https://doi.org/10.1109/ICIC60109.2023.10381942
Kyster ANielsen SHermanns JMottin DKarras PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Boosting Graph Alignment AlgorithmsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482067(3166-3170)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482067
Aminifar ARabbi FPun VLamo Y(2021)Diversity-Aware Anonymization for Structured Health Data2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)10.1109/EMBC46164.2021.9629918(2148-2154)Online publication date: 1-Nov-2021
https://doi.org/10.1109/EMBC46164.2021.9629918
Bhati BIvanchev JBojic IDatta AEckhoff D(2021) Utility-Driven k -Anonymization of Public Transport User Data IEEE Access10.1109/ACCESS.2021.30555059(23608-23623)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3055505
Leung COlawoyin AWen Q(2021)Privacy-Preserving Healthcare Analytics of Trajectory DataWeb and Big Data10.1007/978-3-030-85899-5_30(414-420)Online publication date: 19-Aug-2021
https://doi.org/10.1007/978-3-030-85899-5_30
Rong HMa TTang MCao J(2018)A novel subgraph $$K^{+}$$K+-isomorphism method in social network based on graph similarity detectionSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2513-y22:8(2583-2601)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s00500-017-2513-y
Lin JLiu QFournier-Viger PDjenouri YZhang J(2018)Anonymization of Multiple and Personalized Sensitive AttributesBig Data Analytics and Knowledge Discovery10.1007/978-3-319-98539-8_16(204-215)Online publication date: 8-Aug-2018
https://doi.org/10.1007/978-3-319-98539-8_16
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents