Synthesizing: Art of Anonymization

Gu, Jun; Chen, Yuexian; Fu, Junning; Peng, Huanchun; Ye, Xiaojun

doi:10.1007/978-3-642-15364-8_33

Jun Gu¹⁹,
Yuexian Chen¹⁹,
Junning Fu¹⁹,
Huanchun Peng¹⁹ &
…
Xiaojun Ye¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1081 Accesses

Abstract

Although there are a number of anonymization techniques in the microdata publication, two problems remain: (1) the privacy breaches with auxiliary knowledge; (2) the large information losses during the anonymization. We establish the requirement of presence anonymity and propose the two-step process of synthesizing, consisting of learning a model from the original data, and then sampling a published version with it, which has the similar statistical characteristics and includes fake records. The advantage is that it prevents the auxiliary knowledge attacks as well as enables researchers get correct or approximately correct conclusions. Furthermore, its effectiveness is proved through extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Permutation anonymization

Article 04 August 2015

A Study on Anonymization Through Participation in iPWS Cup 2023

References

Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)
Article MATH MathSciNet Google Scholar
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Google Scholar
Koudas, N., Srivastava, D., Yu, T., Zhang, Q.: Distribution-based microdata anonymization. PVLDB 2, 958–969 (2009)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Google Scholar
Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: KDD’08, pp. 265–273. ACM, New York (2008)
Google Scholar
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: SIGMOD Conference, pp. 689–700 (2007)
Google Scholar
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: ICDE, pp. 126–135 (2007)
Google Scholar
Chen, B.C., Ramakrishnan, R., LeFevre, K.: Privacy skyline: Privacy with multidimensional adversarial knowledge. In: VLDB, pp. 770–781 (2007)
Google Scholar
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD’08, pp. 70–78. ACM, New York (2008)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13 (2001)
Google Scholar
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD Conference, pp. 217–228 (2006)
Google Scholar
Raghunathan, T., Reiter, J., Rubin, D.: Multiple imputation for statistical disclosure limitation. Journal of Official Statistics (2003)
Google Scholar
Woodcock, S.D., Benedetto, G.: Distribution-preserving statistical disclosure limitation. Comput. Stat. Data Anal. 53, 4228–4242 (2009)
Article MATH Google Scholar
Nadaraya, E.A.: On estimating regression. Theory of Probability and its Applications 9, 141–142 (1964)
Article Google Scholar
Wolf, M.: Nonparametric econometrics: Theory and practice. qi li and jeffrey scott racine. Journal of the American Statistical Association 103, 885–886 (2008)
Article Google Scholar
Trenkler, G.: Statistical distributions. Computational Statistics & Data Analysis 19, 483–484 (1995)
Article MathSciNet Google Scholar
Hundepool, A., Willenborg, L.: μ- and τ-argus: Software for statistical disclosure control. In: Third Int’l Seminar Statistical Confidentiality (1997)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)
Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35, 99–109 (1943)
MATH MathSciNet Google Scholar
Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University,
Jun Gu, Yuexian Chen, Junning Fu, Huanchun Peng & Xiaojun Ye

Authors

Jun Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yuexian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junning Fu
View author publications
You can also search for this author in PubMed Google Scholar
Huanchun Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Ye
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DeustoTech Computing, University of Deusto, Avda. Universidades, 24, 48007, Bilbao, Spain
Pablo García Bringas
Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Faculty of Computer Science, Department of Distributed Systems and Multimedia Systems, University of Vienna, Liebiggasse 4/3-4, 1010, Vienna, Austria
Gerald Quirchmayr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, J., Chen, Y., Fu, J., Peng, H., Ye, X. (2010). Synthesizing: Art of Anonymization. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-15364-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics