Abstract
Current solutions to integrating private data with public data have provided useful privacy metrics, such as relative information gain, that can be used to evaluate alternative approaches. Unfortunately, they have not addressed critical performance issues, especially when the public database is very large. The use of hashes and noise yields better performance than existing techniques, while still making it difficult for unauthorized entities to distinguish which data items truly exist in the private database. As we show here, the uncertainty introduced by collisions caused by hashing and the injection of noise can be leveraged to perform a privacy-preserving relational join operation between a massive public table and a relatively smaller private one.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Phillippi, S., Kohler, J.: Using XML Technology for the Ontology-Based Semantic Integration of Life Science Databases. IEEE Transactions on Information Technology in Biomedicine 8(2), 154–160 (2004)
Tomasic, Raschid, L., Valduriez, P.: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions on Knowledge and Data Engineering 16(5), 808–823 (1998)
Davidson, S.B., et al.: Transforming and Integrating Biomedical Data using Kleisli: A Perspective. ACM SIGBIO Newsletter 19(2), 8–13 (1999)
Lacroix, Z., Boucelma, O., Essid, M.: The Biological Integration System. In: Proceedings of WIDM 2003, New Orleans, LA, November 7-8, pp. 45–49 (2003)
Alvarez, M., et al.: FINDER: A Mediator System for Structured and Semi-Structured Data Integration. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, p. 847. Springer, Heidelberg (2002)
Haas, L.M., et al.: DiscoveryLink: A System for Integrated Access to Life Sciences Data Sources. IBM Systems Journal 40(2), 489–511 (2001)
Thuraisingham, B.: Data Mining, National Security, Privacy and Civil Liberties. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 1–5 (2002)
Olivier, M.S.: Database Privacy: Balancing Confidentiality, Integrity and Availability. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 20–27 (2002)
Agrawal, R., et al.: Hippocratic Databases. In: Proceedings of the 28th Very Large Databases (VLDB) Conference, Hong Kong, China (2002)
Sterling, T.D., Weinkam, J.J.: Sharing Scientific Data. Communications of the ACM 33(8), 113–119 (1990)
Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S.: A vision for the future of genomics research. Nature 422(6934), 835–847 (2003)
NCBI, ”GenBank,” [Online] Available (2004), http://www.ncbi.nlm.nih.gov/Genbank/index.html
Bernstein, P.A., Chiu, D.W.: Using Semi-Joins to Solve Relational Queries. Journal of the ACM 28(1), 25–40 (1981)
Vora, P.L.: Towards a Theory of Variable Privacy, in review, (May 7, 2003)
Schadow, G., Grannis, S.J., McDonald, C.J.: Privacy-Preserving Distributed Queries for a Clinical Case Research Network. In: Proceedings of IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan (2002)
Agrawal, D., Aggarwal, C.C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of Principles of Database Systems (PODS) 2001, Santa Barbara, CA, pp. 247–255 (2001)
Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining Privacy for Data Mining. In: Proceedings of the National Science Foundation Workshop on Next Generation Data Mining, Baltimore, MD (November 1-3, 2002)
Clifton, C., et al.: Privacy-Preserving Data Integration and Sharing. In: Proceedings of Data Mining and Knowledge Discovery (DMKD) 2004, Paris, France (June 13 2004)
Vaidya, J., Clifton, C.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In: Proceedings of ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada (2002)
Agrawal, S., Krishnan, V., Haritsa, J.: On Addressing Efficiency Concerns in Privacy-Preserving Data Mining. In: Proceedings of the International Conference on Database Systems for Advanced Applications (DAFSAA) 2004, Jeju Island, Korea, March 17-19, pp. 113–114 (2004)
Du, W., Zhan, Z.: Using Randomized Response Techniques for Privacy-Preserving Data Mining. In: Proceedings of ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) International Conference on Knowledge Discovery and Data Mining (KDD 2003), August 24-27 (2003)
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM International Conference on Management of Data, Dallas, TX, pp. 439–450 (2000)
Chor, B., et al.: Private Information Retrieval. Journal of the ACM 45(6), 965–982 (1998)
Agrawal, R., Evfimievski, A., Srikant, R.: Information Sharing Across Private Databases. In: Proceedings of the Special Interest Group on Management of Data (SIGMOD) 2003, San Diego, CA, June 9-12, pp. 86–97 (2003)
Kantarcioglu, M., Clifton, C.: Assuring Privacy when Big Brother is Watching. In: Proceedings of Data Mining and Knowledge Discovery (DMKD) 2003, San Diego, CA, June 13 (2004)
Clifton, C., et al.: Tools for Privacy Preserving Distributed Data Mining. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 28–34 (2002)
Naor, M., Pinkas, B.: Efficient Oblivious Transfer Protocols. In: Proceedings of Society of Industrial and Applied Mathematics (SIAM) Symposium on Discrete Algorithms, Washington, DC, January 7-9 (2001)
Bellare, M., Micali, S.: Non-Interactive Oblivious Transfer and Applications. In: Proceedings on Advances in Cryptology, Santa Barbara, CA, pp. 547–557 (1989)
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient Private Matching and Set Intersection. In: Proceedings of Eurocrpyt 2004, Interlaken, Switzerland (May 2-6, 2004)
Gertner, Y., et al.: Protecting Data Privacy in Private Information Retrieval Schemes. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, Dallas, TX, pp. 151–160 (1998)
Mullin, J.K.: Optimal Semijoins for Distributed Database Systems. IEEE Transactions on Software Engineering 16(5), 558–560 (1990)
Morrissey, J.M., Osborn, W.K.: Distributed Query Optimization Using Reduction Filters. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, May 24-28, vol. 2, pp. 707–710 (1998)
Bellovin, S., Cheswick, W.R.: Privacy-Enhanced Searches Using Encrypted Bloom Filters. In: Proceedings of DIMACS/Portia Workshop on Privacy-Preserving Data Mining, Piscataway, NJ (March 15-16, 2004)
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623-656 (1948)
MySQL AB, MySQL: The World’s Most Popular Open Source Database (August 2004), http://dev.mysql.com/
Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography, p. 347. CRC Press, Boca Raton (1997)
UCSC Genome Bioinformatics, UCSC Genome Browser Home (August 2004), http://genome.ucsc.edu/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pon, R.K., Critchlow, T. (2005). Performance-Oriented Privacy-Preserving Data Integration. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_19
Download citation
DOI: https://doi.org/10.1007/11530084_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)