Skip to main content

Synthesizing: Art of Anonymization

  • Conference paper
  • 1045 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Abstract

Although there are a number of anonymization techniques in the microdata publication, two problems remain: (1) the privacy breaches with auxiliary knowledge; (2) the large information losses during the anonymization. We establish the requirement of presence anonymity and propose the two-step process of synthesizing, consisting of learning a model from the original data, and then sampling a published version with it, which has the similar statistical characteristics and includes fake records. The advantage is that it prevents the auxiliary knowledge attacks as well as enables researchers get correct or approximately correct conclusions. Furthermore, its effectiveness is proved through extensive experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)

    Google Scholar 

  3. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  4. Koudas, N., Srivastava, D., Yu, T., Zhang, Q.: Distribution-based microdata anonymization. PVLDB 2, 958–969 (2009)

    Google Scholar 

  5. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)

    Google Scholar 

  6. Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: KDD’08, pp. 265–273. ACM, New York (2008)

    Google Scholar 

  7. Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: SIGMOD Conference, pp. 689–700 (2007)

    Google Scholar 

  8. Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: ICDE, pp. 126–135 (2007)

    Google Scholar 

  9. Chen, B.C., Ramakrishnan, R., LeFevre, K.: Privacy skyline: Privacy with multidimensional adversarial knowledge. In: VLDB, pp. 770–781 (2007)

    Google Scholar 

  10. Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: KDD’08, pp. 70–78. ACM, New York (2008)

    Google Scholar 

  11. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13 (2001)

    Google Scholar 

  13. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD Conference, pp. 217–228 (2006)

    Google Scholar 

  14. Raghunathan, T., Reiter, J., Rubin, D.: Multiple imputation for statistical disclosure limitation. Journal of Official Statistics (2003)

    Google Scholar 

  15. Woodcock, S.D., Benedetto, G.: Distribution-preserving statistical disclosure limitation. Comput. Stat. Data Anal. 53, 4228–4242 (2009)

    Article  MATH  Google Scholar 

  16. Nadaraya, E.A.: On estimating regression. Theory of Probability and its Applications 9, 141–142 (1964)

    Article  Google Scholar 

  17. Wolf, M.: Nonparametric econometrics: Theory and practice. qi li and jeffrey scott racine. Journal of the American Statistical Association 103, 885–886 (2008)

    Article  Google Scholar 

  18. Trenkler, G.: Statistical distributions. Computational Statistics & Data Analysis 19, 483–484 (1995)

    Article  MathSciNet  Google Scholar 

  19. Hundepool, A., Willenborg, L.: μ- and τ-argus: Software for statistical disclosure control. In: Third Int’l Seminar Statistical Confidentiality (1997)

    Google Scholar 

  20. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)

    Google Scholar 

  21. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35, 99–109 (1943)

    MATH  MathSciNet  Google Scholar 

  22. Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gu, J., Chen, Y., Fu, J., Peng, H., Ye, X. (2010). Synthesizing: Art of Anonymization. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15364-8_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15363-1

  • Online ISBN: 978-3-642-15364-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics