Performance-Oriented Privacy-Preserving Data Integration

Pon, Raymond K.; Critchlow, Terence

doi:10.1007/11530084_19

Raymond K. Pon²¹ &
Terence Critchlow²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3615))

Included in the following conference series:

International Workshop on Data Integration in the Life Sciences

862 Accesses
2 Citations

Abstract

Current solutions to integrating private data with public data have provided useful privacy metrics, such as relative information gain, that can be used to evaluate alternative approaches. Unfortunately, they have not addressed critical performance issues, especially when the public database is very large. The use of hashes and noise yields better performance than existing techniques, while still making it difficult for unauthorized entities to distinguish which data items truly exist in the private database. As we show here, the uncertainty introduced by collisions caused by hashing and the injection of noise can be leveraged to perform a privacy-preserving relational join operation between a massive public table and a relatively smaller private one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards Efficient Multi-domain Data Processing

Idea: Supporting Policy-Based Access Control on Database Systems

NOSQL Design for Analytical Workloads: Variability Matters

References

Phillippi, S., Kohler, J.: Using XML Technology for the Ontology-Based Semantic Integration of Life Science Databases. IEEE Transactions on Information Technology in Biomedicine 8(2), 154–160 (2004)
Article Google Scholar
Tomasic, Raschid, L., Valduriez, P.: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions on Knowledge and Data Engineering 16(5), 808–823 (1998)
Article Google Scholar
Davidson, S.B., et al.: Transforming and Integrating Biomedical Data using Kleisli: A Perspective. ACM SIGBIO Newsletter 19(2), 8–13 (1999)
Article Google Scholar
Lacroix, Z., Boucelma, O., Essid, M.: The Biological Integration System. In: Proceedings of WIDM 2003, New Orleans, LA, November 7-8, pp. 45–49 (2003)
Google Scholar
Alvarez, M., et al.: FINDER: A Mediator System for Structured and Semi-Structured Data Integration. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, p. 847. Springer, Heidelberg (2002)
Google Scholar
Haas, L.M., et al.: DiscoveryLink: A System for Integrated Access to Life Sciences Data Sources. IBM Systems Journal 40(2), 489–511 (2001)
Article Google Scholar
Thuraisingham, B.: Data Mining, National Security, Privacy and Civil Liberties. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 1–5 (2002)
Google Scholar
Olivier, M.S.: Database Privacy: Balancing Confidentiality, Integrity and Availability. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 20–27 (2002)
Google Scholar
Agrawal, R., et al.: Hippocratic Databases. In: Proceedings of the 28th Very Large Databases (VLDB) Conference, Hong Kong, China (2002)
Google Scholar
Sterling, T.D., Weinkam, J.J.: Sharing Scientific Data. Communications of the ACM 33(8), 113–119 (1990)
Article Google Scholar
Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S.: A vision for the future of genomics research. Nature 422(6934), 835–847 (2003)
Article Google Scholar
NCBI, ”GenBank,” [Online] Available (2004), http://www.ncbi.nlm.nih.gov/Genbank/index.html
Bernstein, P.A., Chiu, D.W.: Using Semi-Joins to Solve Relational Queries. Journal of the ACM 28(1), 25–40 (1981)
Article MATH MathSciNet Google Scholar
Vora, P.L.: Towards a Theory of Variable Privacy, in review, (May 7, 2003)
Google Scholar
Schadow, G., Grannis, S.J., McDonald, C.J.: Privacy-Preserving Distributed Queries for a Clinical Case Research Network. In: Proceedings of IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan (2002)
Google Scholar
Agrawal, D., Aggarwal, C.C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of Principles of Database Systems (PODS) 2001, Santa Barbara, CA, pp. 247–255 (2001)
Google Scholar
Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining Privacy for Data Mining. In: Proceedings of the National Science Foundation Workshop on Next Generation Data Mining, Baltimore, MD (November 1-3, 2002)
Google Scholar
Clifton, C., et al.: Privacy-Preserving Data Integration and Sharing. In: Proceedings of Data Mining and Knowledge Discovery (DMKD) 2004, Paris, France (June 13 2004)
Google Scholar
Vaidya, J., Clifton, C.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In: Proceedings of ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada (2002)
Google Scholar
Agrawal, S., Krishnan, V., Haritsa, J.: On Addressing Efficiency Concerns in Privacy-Preserving Data Mining. In: Proceedings of the International Conference on Database Systems for Advanced Applications (DAFSAA) 2004, Jeju Island, Korea, March 17-19, pp. 113–114 (2004)
Google Scholar
Du, W., Zhan, Z.: Using Randomized Response Techniques for Privacy-Preserving Data Mining. In: Proceedings of ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) International Conference on Knowledge Discovery and Data Mining (KDD 2003), August 24-27 (2003)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM International Conference on Management of Data, Dallas, TX, pp. 439–450 (2000)
Google Scholar
Chor, B., et al.: Private Information Retrieval. Journal of the ACM 45(6), 965–982 (1998)
Article MATH MathSciNet Google Scholar
Agrawal, R., Evfimievski, A., Srikant, R.: Information Sharing Across Private Databases. In: Proceedings of the Special Interest Group on Management of Data (SIGMOD) 2003, San Diego, CA, June 9-12, pp. 86–97 (2003)
Google Scholar
Kantarcioglu, M., Clifton, C.: Assuring Privacy when Big Brother is Watching. In: Proceedings of Data Mining and Knowledge Discovery (DMKD) 2003, San Diego, CA, June 13 (2004)
Google Scholar
Clifton, C., et al.: Tools for Privacy Preserving Distributed Data Mining. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) Explorations Newsletter 4(2), 28–34 (2002)
MathSciNet Google Scholar
Naor, M., Pinkas, B.: Efficient Oblivious Transfer Protocols. In: Proceedings of Society of Industrial and Applied Mathematics (SIAM) Symposium on Discrete Algorithms, Washington, DC, January 7-9 (2001)
Google Scholar
Bellare, M., Micali, S.: Non-Interactive Oblivious Transfer and Applications. In: Proceedings on Advances in Cryptology, Santa Barbara, CA, pp. 547–557 (1989)
Google Scholar
Freedman, M.J., Nissim, K., Pinkas, B.: Efficient Private Matching and Set Intersection. In: Proceedings of Eurocrpyt 2004, Interlaken, Switzerland (May 2-6, 2004)
Google Scholar
Gertner, Y., et al.: Protecting Data Privacy in Private Information Retrieval Schemes. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, Dallas, TX, pp. 151–160 (1998)
Google Scholar
Mullin, J.K.: Optimal Semijoins for Distributed Database Systems. IEEE Transactions on Software Engineering 16(5), 558–560 (1990)
Article Google Scholar
Morrissey, J.M., Osborn, W.K.: Distributed Query Optimization Using Reduction Filters. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, May 24-28, vol. 2, pp. 707–710 (1998)
Google Scholar
Bellovin, S., Cheswick, W.R.: Privacy-Enhanced Searches Using Encrypted Bloom Filters. In: Proceedings of DIMACS/Portia Workshop on Privacy-Preserving Data Mining, Piscataway, NJ (March 15-16, 2004)
Google Scholar
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623-656 (1948)
MATH MathSciNet Google Scholar
MySQL AB, MySQL: The World’s Most Popular Open Source Database (August 2004), http://dev.mysql.com/
Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography, p. 347. CRC Press, Boca Raton (1997)
MATH Google Scholar
UCSC Genome Bioinformatics, UCSC Genome Browser Home (August 2004), http://genome.ucsc.edu/

Download references

Author information

Authors and Affiliations

UCLA Computer Science Department, Los Angeles, California, USA
Raymond K. Pon
Lawrence Livermore National Laboratory, Livermore, California, USA
Terence Critchlow

Authors

Raymond K. Pon
View author publications
You can also search for this author in PubMed Google Scholar
Terence Critchlow
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Davis,
Bertram Ludäscher
University of Maryland, College Park, 20742, MD, USA
Louiqa Raschid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pon, R.K., Critchlow, T. (2005). Performance-Oriented Privacy-Preserving Data Integration. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_19

Download citation

DOI: https://doi.org/10.1007/11530084_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27967-9
Online ISBN: 978-3-540-31879-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics