Abstract
Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session.
Similar content being viewed by others
Notes
Although it also has a GPU implementation, the reason for its efficiency is due to its use of linear algebra, so we use all CPU implementations for fairness of comparison.
We tried larger block sizes such as 100 buckets for optimized performance, but a large block size for 10 M data made lPIR [17] crash.
References
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01), pp. 247–255 (2001)
Arrington, M.: AOL proudly releases massive amounts of private data (2006). http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data
Bethencourt, J., Song, D., Waters, B.: New techniques for private stream searching. ACM Trans. Inf. Syst. Secur. 12, 16:1–16:32 (2009)
Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)
De Capitani di Vimercati, S., Foresti, S., Paraboschi, S., Pelosi, G., Samarati, P.: Efficient and private access to outsourced data. In: Proc. of the 31st International Conference on Distributed Computing Systems (ICDCS 2011), pp. 710–719 (2011)
Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: USENIX Security Symposium, pp. 303–320 (2004)
Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), pp. 265–273. ACM, New York (2008)
Gentry, C., Ramzan, Z.: Single-database private information retrieval with constant communication rate. In: Proceedings of the 32nd International Colloquium on Automata, Languages and Programming, pp. 803–815 (2005)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Mateo (2000)
Howe, D.C., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. In: Lessons from the Identity Trail: Anonymity, Privacy, and Identity in a Networked Society, pp. 417–436. Oxford University Press, London (2009). Chap. 23
Ibarra, O.H., Kim, C.E.: Fast approximation algorithms for the knapsack and sum of subset problems. J. ACM 22, 463–468 (1975)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 1026–1037 (2004)
Kushilevitz, E., Ostrovsky, R.: Replication is not needed: single database, computationally-private information retrieval. In: FOCS, pp. 364–373 (1997)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
McCullagh, D.: Privacy leaks hit Facebook, Google, at&t (2010). http://news.cnet.com/2702-1009_3-986.html
Melchor, C.A., Crespin, B., Gaborit, P., Jolivet, V., Rousseau, P.: High-speed private information retrieval computation on GPU. In: Secureware, pp. 263–272 (2008)
Melchor, C.A., Gaborit, P.: A fast private information retrieval protocol. In: IEEE Internal Symposium on Information Theory, pp. 1848–1852 (2008)
Mokbel, M.F., Chow, C.Y., Aref, W.G.: The new Casper: query processing for location services without compromising privacy. In: VLDB, pp. 763–774 (2006)
Murugesan, M., Clifton, C.: Providing privacy through plausibly deniable search. In: SDM, pp. 768–779 (2009)
Olumofin, F.G., Goldberg, I.: Revisiting the computational practicality of private information retrieval. In: Financial Cryptography, pp. 158–172 (2011)
Olumofin, F.G., Tysowski, P.K., Goldberg, I., Hengartner, U.: Achieving efficient query privacy for location based services. In: Privacy Enhancing Technologies, pp. 93–110 (2010)
Ostrovsky, R., Skeith, W.E.: Private searching on streaming data. J. Cryptol. 20, 397–430 (2007)
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology (EUROCRYPT’99). Lecture Notes in Computer Science, vol. 1592, pp. 223–238. Springer, Berlin (1999)
Pang, H., Ding, X., Xiao, X.: Embellishing text search queries to protect user privacy. Proc. VLDB Endow. 3(1), 598–607 (2010)
Peddinti, S.T., Saxena, N.: On the privacy of web search based on query obfuscation: a case study of TrackMeNot. In: Privacy Enhancing Technologies, pp. 19–37 (2010)
Rebollo-Monedero, D., Forné, J.: Optimized query forgery for private information retrieval. IEEE Trans. Inf. Theory 56(9), 4631–4642 (2010)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98), p. 188 (1998)
Schwartz, M.J.: Twitter finalizes ftc security settlement (2011). http://www.informationweek.com/news/security/attacks/229301037
Sion, R., Carbunar, B.: On the computational practicality of private information retrieval. In: Network and Distributed System Security Symposium (2007)
Wang, S., Agrawal, D., Abbadi, A.E.: Generalizing PIR for practical private retrieval of public data. In: DBSec, pp. 1–16 (2010)
Williams, P., Sion, R.: Usable private information retrieval. In: Network and Distributed System Security Symposium (2008)
Ye, S., Wu, F., Pandey, R., Chen, H.: Noise injection for search privacy protection. In: Proceedings of the 2009 International Conference on Computational Science and Engineering, vol. 3, pp. 1–8 (2009)
Acknowledgement
This work is funded by NSF grant CNS 1053594. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Elena Ferrari.
Rights and permissions
About this article
Cite this article
Wang, S., Agrawal, D. & El Abbadi, A. Towards practical private processing of database queries over public data. Distrib Parallel Databases 32, 65–89 (2014). https://doi.org/10.1007/s10619-012-7118-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7118-y