skip to main content
10.1145/2714576.2714590acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

k-Anonymization by Freeform Generalization

Published: 14 April 2015 Publication History

Abstract

Syntactic data anonymization strives to (i) ensure that an adversary cannot identify an individual's record from published attributes with high probability, and (ii) provide high data utility. These mutually conflicting goals can be expressed as an optimization problem with privacy as the constraint and utility as the objective function. Conventional research using the k-anonymity model has resorted to publishing data in homogeneous generalized groups. A recently proposed alternative does not create such cliques; instead, it recasts data values in a heterogeneous manner, aiming for higher utility. Nevertheless, such works never defined the problem in the most general terms; thus, the utility gains they achieve are limited. In this paper, we propose a methodology that achieves the full potential of heterogeneity and gains higher utility while providing the same privacy guarantee. We formulate the problem of maximal-utility k-anonymization by freeform generalization as a network flow problem. We develop an optimal solution therefor using Mixed Integer Programming. Given the non-scalability of this solution, we develop an O(k n2) Greedy algorithm that has no time-complexity disadvantage vis-á-vis previous approaches, an O(k n2 log n) enhanced version thereof, and an O(k n3) adaptation of the Hungarian algorithm; these algorithms build a set of k perfect matchings from original to anonymized data, a novel approach to the problem. Moreover, our techniques can resist adversaries who may know the employed algorithms. Our experiments with real-world data verify that our schemes achieve near-optimal utility (with gains of up to 41%), while they can exploit parallelism and data partitioning, gaining an efficiency advantage over simpler methods.

References

[1]
http://www.ipums.org.
[2]
C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, 2005.
[3]
R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.
[4]
R. E. Bixby, M. Fenelon, Z. Gu, E. Rothberg, and R. Wunderling. Mixed-integer programming: A progress report. In M. Grötschel, editor, The Sharpest Cut, chapter 18, pages 309--325. 2004.
[5]
J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In KDD, 2008.
[6]
J. Cao and P. Karras. Publishing microdata with a robust privacy guarantee. PVLDB, 5(11):1388--1399, 2012.
[7]
J. Cao, P. Karras, P. Kalnis, and K.-L. Tan. SABRE: a Sensitive Attribute Bucketization and REdistribution framework for $t$-closeness. The VLDB Journal, 20(1):59--81, 2011.
[8]
R. Chaytor and K. Wang. Small domain randomization: Same privacy, more utility. PVLDB, 3(1):608--618, 2010.
[9]
K. Choromanski, T. Jebara, and K. Tang. Adaptive anonymity via phb-matching. In NIPS, pages 3192--3200, 2013.
[10]
C. Clifton and T. Tassa. On syntactic anonymity and differential privacy. In PrivDB, 2013.
[11]
G. Cormode, N. Li, T. Li, and D. Srivastava. Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. PVLDB, 3(1):1045--1056, 2010.
[12]
C. Dwork. Differential privacy. In ICALP (2), 2006.
[13]
J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. J. of the ACM, 19(2):248--264, 1972.
[14]
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. Fast data anonymization with low information loss. In VLDB, 2007.
[15]
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. A framework for efficient data anonymization under privacy and accuracy constraints. ACM TODS, 34(2):1--47, 2009.
[16]
A. Gionis, A. Mazza, and T. Tassa. k-anonymization revisited. In ICDE, 2008.
[17]
A. Korolova. Privacy violations using microtargeted ads: A case study. In ICDM Workshops, 2010.
[18]
H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1--2):83--97, 1955.
[19]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM TODS, 33(3):17:1--17:47, 2008.
[20]
N. Li, T. Li, and S. Venkatasubramanian. Closeness: A new privacy measure for data publishing. IEEE TKDE, 22(7):943--956, 2010.
[21]
N. Li, W. H. Qardaji, and D. Su. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In ASIACCS, 2012.
[22]
T. Li and N. Li. On the tradeoff between privacy and utility in data publishing. In KDD, 2009.
[23]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. $\ell$-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3, 2007.
[24]
P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 13(6):1010--1027, 2001.
[25]
T. Tassa, A. Mazza, and A. Gionis. k-concealment: An alternative model of k-type anonymity. Transactions on Data Privacy, 5(1):189--222, 2012.
[26]
N. Tomizawa. On some techniques useful for solution of transportation network problems. Networks, 1:173--194, 1971.
[27]
R. Wacks. Privacy. A very short introduction, volume 221 of Very short introductions. Oxford University Press, 2010.
[28]
W. K. Wong, N. Mamoulis, and D. W. L. Cheung. Non-homogeneous generalization in privacy preserving data publishing. In SIGMOD, 2010.
[29]
M. Xue, P. Karras, C. Raíssi, J. Vaidya, and K.-L. Tan. Anonymizing set-valued data by nonreciprocal recoding. In KDD, 2012.

Cited By

View all
  • (2024)Iterative local search for preserving data privacyApplied Intelligence10.1007/s10489-024-05909-w55:3Online publication date: 20-Dec-2024
  • (2023)GRASP: Scalable Graph Alignment by Spectral Corresponding FunctionsACM Transactions on Knowledge Discovery from Data10.1145/356105817:4(1-26)Online publication date: 24-Feb-2023
  • (2023)$kt$-Safety: Graph Release via $k$-Anonymity and $t$-ClosenessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322133335:9(9102-9113)Online publication date: 1-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '15: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security
April 2015
698 pages
ISBN:9781450332453
DOI:10.1145/2714576
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anonymization
  2. freeform generalization
  3. privacy

Qualifiers

  • Research-article

Conference

ASIA CCS '15
Sponsor:
ASIA CCS '15: 10th ACM Symposium on Information, Computer and Communications Security
April 14 - March 17, 2015
Singapore, Republic of Singapore

Acceptance Rates

ASIA CCS '15 Paper Acceptance Rate 48 of 269 submissions, 18%;
Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Iterative local search for preserving data privacyApplied Intelligence10.1007/s10489-024-05909-w55:3Online publication date: 20-Dec-2024
  • (2023)GRASP: Scalable Graph Alignment by Spectral Corresponding FunctionsACM Transactions on Knowledge Discovery from Data10.1145/356105817:4(1-26)Online publication date: 24-Feb-2023
  • (2023)$kt$-Safety: Graph Release via $k$-Anonymity and $t$-ClosenessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322133335:9(9102-9113)Online publication date: 1-Sep-2023
  • (2023)Preserving Individual Privacy from Inference Attack in Transaction Data Publishing2023 Eighth International Conference on Informatics and Computing (ICIC)10.1109/ICIC60109.2023.10381942(1-6)Online publication date: 8-Dec-2023
  • (2021)Boosting Graph Alignment AlgorithmsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482067(3166-3170)Online publication date: 26-Oct-2021
  • (2021)Diversity-Aware Anonymization for Structured Health Data2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)10.1109/EMBC46164.2021.9629918(2148-2154)Online publication date: 1-Nov-2021
  • (2021) Utility-Driven k -Anonymization of Public Transport User Data IEEE Access10.1109/ACCESS.2021.30555059(23608-23623)Online publication date: 2021
  • (2021)Privacy-Preserving Healthcare Analytics of Trajectory DataWeb and Big Data10.1007/978-3-030-85899-5_30(414-420)Online publication date: 19-Aug-2021
  • (2018)A novel subgraph $$K^{+}$$K+-isomorphism method in social network based on graph similarity detectionSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2513-y22:8(2583-2601)Online publication date: 1-Apr-2018
  • (2018)Anonymization of Multiple and Personalized Sensitive AttributesBig Data Analytics and Knowledge Discovery10.1007/978-3-319-98539-8_16(204-215)Online publication date: 8-Aug-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media