Skip to main content

Constrained Anonymization of Production Data: A Constraint Satisfaction Problem Approach

  • Conference paper
Secure Data Management (SDM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6358))

Included in the following conference series:

Abstract

The use of production data which contains sensitive information in application testing requires that the production data be anonymized first. The task of anonymizing production data becomes difficult since it usually consists of constraints which must also be satisfied in the anonymized data. We propose a novel approach to anonymize constrained production data based on the concept of constraint satisfaction problems. Due to the generality of the constraint satisfaction framework, our approach can support a wide variety of mandatory integrity constraints as well as constraints which ensure the similarity of the anonymized data to the production data. Our approach decomposes the constrained anonymization problem into independent sub-problems which can be represented and solved as constraint satisfaction problems (CSPs). Since production databases may contain many records that are associated by vertical constraints, the resulting CSPs may become very large. Such CSPs are further decomposed into dependant sub-problems that are solved iteratively by applying local modifications to the production data. Simulations on synthetic production databases demonstrate the feasibility of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Choco solver (2010), http://choco.emn.fr

  2. Beldiceanu, N., Carlsson, M., Rampon, J.X.: Global constraint catalog (2005)

    Google Scholar 

  3. Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: Qagen: generating query-aware test databases. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 341–352. ACM, New York (2007)

    Chapter  Google Scholar 

  4. Bruno, N., Chaudhuri, S.: Flexible database generators. In: VLDB 2005: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 1097–1107. VLDB Endowment (2005)

    Google Scholar 

  5. Castellanos, M., Zhang, B., Jimenez, I., Ruiz, P., Durazo, M., Dayal, U., Jow, L.: Data desensitization of customer data for use in optimizer performance experiments. In: Proceedings of the 26th IEEE International Conference on Data Engineering (2010)

    Google Scholar 

  6. Duncan, K., Wells, D.: A Rule-Based Data Cleansing. Journal of Data Warehousing 4(3), 146–159 (1999)

    Google Scholar 

  7. Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. ACM SIGMOD Record 23(2), 252 (1994)

    Article  Google Scholar 

  8. Hoag, J.E., Thompson, C.W.: A parallel general-purpose synthetic data generator. SIGMOD Rec. 36(1), 19–24 (2007)

    Article  Google Scholar 

  9. Houkjar, K., Torp, K., Wind, R.: Simple and realistic data generation. In: Proceedings of the 32nd international conference on Very large data bases, p. 1246. VLDB Endowment (2006)

    Google Scholar 

  10. Camouflage Software Inc.: Camouflage transformers. Data Sheet (2009), http://www.datamasking.com

  11. Camouflage Software Inc.: Enterprise-wide data masking with the camouflage translation matrix. Data Sheet (2009), http://www.datamasking.com

  12. Camouflage Software Inc.: Secure analytics - maximizing data quality & minimizing risk for banking and insurance firms. White Paper (2009), http://www.datamasking.com

  13. Grid-Tools GridTools Ltd.: Simple data masking. Data Sheet (2009), http://www.grid-tools.com

  14. Russell, S.J., Norvig, P., Canny, J.F., Malik, J., Edwards, D.D.: Artificial intelligence: a modern approach. Prentice Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  15. Wang, K., Chen, R., Yu, P.S.: Privacy-Preserving Data Publishing: A Survey on Recent Developments (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yahalom, R., Shmueli, E., Zrihen, T. (2010). Constrained Anonymization of Production Data: A Constraint Satisfaction Problem Approach. In: Jonker, W., Petković, M. (eds) Secure Data Management. SDM 2010. Lecture Notes in Computer Science, vol 6358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15546-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15546-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15545-1

  • Online ISBN: 978-3-642-15546-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics