Abstract
The use of production data which contains sensitive information in application testing requires that the production data be anonymized first. The task of anonymizing production data becomes difficult since it usually consists of constraints which must also be satisfied in the anonymized data. We propose a novel approach to anonymize constrained production data based on the concept of constraint satisfaction problems. Due to the generality of the constraint satisfaction framework, our approach can support a wide variety of mandatory integrity constraints as well as constraints which ensure the similarity of the anonymized data to the production data. Our approach decomposes the constrained anonymization problem into independent sub-problems which can be represented and solved as constraint satisfaction problems (CSPs). Since production databases may contain many records that are associated by vertical constraints, the resulting CSPs may become very large. Such CSPs are further decomposed into dependant sub-problems that are solved iteratively by applying local modifications to the production data. Simulations on synthetic production databases demonstrate the feasibility of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Choco solver (2010), http://choco.emn.fr
Beldiceanu, N., Carlsson, M., Rampon, J.X.: Global constraint catalog (2005)
Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: Qagen: generating query-aware test databases. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 341–352. ACM, New York (2007)
Bruno, N., Chaudhuri, S.: Flexible database generators. In: VLDB 2005: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 1097–1107. VLDB Endowment (2005)
Castellanos, M., Zhang, B., Jimenez, I., Ruiz, P., Durazo, M., Dayal, U., Jow, L.: Data desensitization of customer data for use in optimizer performance experiments. In: Proceedings of the 26th IEEE International Conference on Data Engineering (2010)
Duncan, K., Wells, D.: A Rule-Based Data Cleansing. Journal of Data Warehousing 4(3), 146–159 (1999)
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. ACM SIGMOD Record 23(2), 252 (1994)
Hoag, J.E., Thompson, C.W.: A parallel general-purpose synthetic data generator. SIGMOD Rec. 36(1), 19–24 (2007)
Houkjar, K., Torp, K., Wind, R.: Simple and realistic data generation. In: Proceedings of the 32nd international conference on Very large data bases, p. 1246. VLDB Endowment (2006)
Camouflage Software Inc.: Camouflage transformers. Data Sheet (2009), http://www.datamasking.com
Camouflage Software Inc.: Enterprise-wide data masking with the camouflage translation matrix. Data Sheet (2009), http://www.datamasking.com
Camouflage Software Inc.: Secure analytics - maximizing data quality & minimizing risk for banking and insurance firms. White Paper (2009), http://www.datamasking.com
Grid-Tools GridTools Ltd.: Simple data masking. Data Sheet (2009), http://www.grid-tools.com
Russell, S.J., Norvig, P., Canny, J.F., Malik, J., Edwards, D.D.: Artificial intelligence: a modern approach. Prentice Hall, Englewood Cliffs (1995)
Wang, K., Chen, R., Yu, P.S.: Privacy-Preserving Data Publishing: A Survey on Recent Developments (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yahalom, R., Shmueli, E., Zrihen, T. (2010). Constrained Anonymization of Production Data: A Constraint Satisfaction Problem Approach. In: Jonker, W., Petković, M. (eds) Secure Data Management. SDM 2010. Lecture Notes in Computer Science, vol 6358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15546-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-15546-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15545-1
Online ISBN: 978-3-642-15546-8
eBook Packages: Computer ScienceComputer Science (R0)