ABSTRACT
Traditionally, application software developers carry out their tests on their own local development databases. However, such local databases usually have only a small number of sample data and hence cannot simulate satisfactorily a live environment, especially in terms of performance and scalability testing. On the other hand, the idea of testing applications over live production databases is increasingly problematic in most situations primarily due to the fact that such use of live production databases has the potential to expose sensitive data to an unauthorized tester and to incorrectly update information in the underlying database. In this paper, we investigate techniques to generate mock databases for application software testing without revealing any confidential information from the live production databases. Specifically, we will design mechanisms to create the deterministic rule set R, non-deterministic rule set N R, and statistic data set S for a live production database. We will then build a security Analyzer which will process the triplet <R',N R',S'> together with security requirements (security policy) and output a new triplet <R',N R',S'> The security Analyzer will guarantee that no confidential information could be inferred from the new triplet <R',N R',S'> The mock database generated from this new triplet can simulate the live environment for testing purpose, while maintaining the privacy of data in the original database.
- N. R. Adam, and J. C. Wortman. Security-control methods for statistical databases. ACM Computing Surveys, 21(4):515--556, Dec. 1989.]] Google ScholarDigital Library
- R. Agrawal, and R. Srikant. Privacy-preserving data mining. In Proceedings of ACM SIGMOD Conference on Management of Data, pp. 439--450, Dallas, Texas, May 2000.]] Google ScholarDigital Library
- L. Brankovic, and V. Estivill-Castro. Privacy issues in knowledge discovery and data mining. In Proceedings of 1st Australian Institute of Computer Ethics Conference, July, 1999.]]Google Scholar
- D. Chays, S. Dan, P. Frankl, F. Vokolos, E. Weyuker. A framework for testing database applications. In Proceedings of International Symposium on Software Testing and Analysis, Portland, Oregon, August 2000.]] Google ScholarDigital Library
- B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval. FOCS 1995.]] Google ScholarDigital Library
- Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting data privacy in private information retrieval schemes. JCSS 60 (3):592--629 (2000).]] Google ScholarDigital Library
- R. A. Davies, R. J. Beynon, and B. F. Jones. Automating the testing of databases. In Proceedings of the first International Workshop on Automated Program Analysis, Testing and Verification, June 2000.]]Google Scholar
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In: Proc. 22nd ACM PODS, pages 202--210, ACM Press, 2003.]] Google ScholarDigital Library
- J. Domingo-Ferrer. Current directions in statistical data protection. In Proceeding of Statistical Data Protection, 1998.]]Google Scholar
- V. Estivill-Castro, and L. Brankovic. Data swapping: balancing privacy against precision in mining logical rules. In Proceedings of International Conference of Data Warehousing and Knowledge Discovery, 1999.]] Google ScholarDigital Library
- A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy Preserving Mining of Association Rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.]] Google ScholarDigital Library
- O. Goldreich. Foundation of Cryptography | Basic Tools. Cambridge University Press, 2001.]] Google ScholarDigital Library
- S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM J. Computing18:186--208, 1989.]] Google ScholarDigital Library
- S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Science28 (2):270--299, 1984.]]Google ScholarCross Ref
- A. Gotlieb, B. Botella, and M. Rueher. Automatic test data generation using constraint solving techniques. In Proceedings of the 1998 International Symposium on Software Testing and Analysis, pp. 53--62, March 1998.]] Google ScholarDigital Library
- J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly generating billion-records synthetic databases. In ACM SIGMOD, pp. 243--252, June 1994.]] Google ScholarDigital Library
- M. Kantarcioglu, and C. Clifton. Privacy preserving distributed mining of association rules on horizontally partitioned data. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 24--31, June 2002.]]Google Scholar
- J. J. Kim. A method for limiting disclosure in microdata based on random noise and transformation. In Proceedings of the section on survey research methods, American Statistical Association, 1986.]]Google Scholar
- J. J. Kim, and W. E. Winkler. Masking microdata files. Report of Bureau of the Census, 1997.]]Google Scholar
- S. Kirkpatrick, S. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science 220(4958):671--680.]]Google Scholar
- Y. Lindell, and B. Pinkas. Privacy preserving data mining. In CRYPTO, pp. 36--54, 2000.]] Google ScholarDigital Library
- Niagara. http://www.cs.wisc.edu/niagara/datagendownload.html]]Google Scholar
- W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical recipes in C, the art of scientific computing. Cambridge University Press, 1988.]] Google ScholarDigital Library
- Quest. http://www.almaden.ibm.com/software/quest/]]Google Scholar
- S. Rizvi, and J. Haritsa. Privacy preserving association rule mining. In Proceedings of 28th International Conference on Very Large Data Bases. Aug, 2002.]] Google ScholarDigital Library
- C. J. Skinner. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica, 44:21--32, 1992.]]Google ScholarCross Ref
- M. Stonebraker, and L. Rowe. The design of postgres. In Proceedings of ACM-SIGMOD International Conference on the Management of Data, June 1986.]] Google ScholarDigital Library
- B. Malin, L. Sweeney, and E. Newton. Trail re-identification: learning who you are from where you have been. Proc. LIDAP-WP12. Carnegie Mellon University, 2003.]]Google Scholar
- Transaction Processing Performance Council. TPC-Benchmark C. 1998.]]Google Scholar
- Edward Tsang. Foundations of constraint satisfaction. Academic Press, 1993.]]Google Scholar
- J. Vaidya, and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.]] Google ScholarDigital Library
- J. Vaidya, and C. Clifton. Privacy preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206--215, August 2003.]] Google ScholarDigital Library
- G. Wiederhold, and M. Bilello. Protecting inappropriate release of data from realistic databases. In Proceedings of the Ninth International Workshop on database and Expert Systems Applications, Vienna, Austria, 1998.]] Google ScholarDigital Library
- G. Wiederhold, M. Bilello, and C. Donahue. Web implementation of a security mediator for medical databases. In Proceedings of the Eleventh International Conference on Database Security, 1997.]] Google ScholarDigital Library
- A. Yao. How to generate and exchange secrets. In Proceedings of the 27th IEEE symposium on Foundations of Computer Science, pp. 162--167, 1986.]]Google ScholarDigital Library
- A. Yao. Theory and application of trap-door functions. In Proc. of 23rd IEEE Symposium on Foundation of Computer Science, page 80--91, 1982.]]Google ScholarDigital Library
Index Terms
- Privacy preserving database application testing
Recommendations
Privacy-preserving deletion to generalization-based anonymous database
CUBE '12: Proceedings of the CUBE International Information Technology ConferenceWhile creating an anonymous database it is assumed that all data is available at the time of creation. Once record is added to database, it is not deleted or if a user wants to delete person's record from database, it will be removed from it in its next ...
Privacy Preserving Database Generation for Database Application Testing
Special issue ISMIS'05Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications ...
Evaluating privacy threats in released database views by symmetric indistinguishability
Selected papers from the Third and Fourth Secure Data Management (SDM) workshopsA privacy violation occurs when the association between an individual identity and data considered private by that individual is obtained by an unauthorized party. Uncertainty and indistinguishability are two independent aspects that characterize the ...
Comments