ABSTRACT
A few years ago, Dinur and Nissim (PODS, 2003) proposed an algorithm for breaking database privacy when statistical queries are answered with a perturbation error of magnitude o(√n) for a database of size n. This negative result is very strong in the sense that it completely reconstructs Ω(n) data bits with an algorithm that is simple, uses random queries, and does not put any restriction on the perturbation other than its magnitude. Their algorithm works for a model where the database consists of bits, and the statistical queries asked by the adversary are sum queries for a subset of locations.
In this paper we extend the attack to work for much more general settings in terms of the type of statistical query allowed, the database domain, and the general tradeoff between perturbation and privacy. Specifically, we prove:
For queries of the type ∑in=1 φixi; where φ_{i} are i.i.d. and with a finite third moment and positive variance (this includes as a special case the sum queries of Dinur-Nissim and several subsequent extensions), we prove that the quadratic relation between the perturbation and what the adversary can reconstruct holds even for smaller perturbations, and even for a larger data domain. If φi is Gaussian, Poissonian, or bounded and of positive variance, this holds for arbitrary data domains and perturbation; for other φi this holds as long as the domain is not too large and the perturbation is not too small.
A positive result showing that for a sum query the negative result mentioned above is tight. Specifically, we build a distribution on bit databases and an answering algorithm such that any adversary who wants to recover a little more than the negative result above allows, will not succeed except with negligible probability.
We consider a richer class of summation queries, focusing on databases representing graphs, where each entry is an edge, and the query is a structural function of a subgraph. We show an attack that recovers a big portion of the graph edges, as long as the graph and the function satisfy certain properties.
The attacking algorithms in both our negative results are straight-forward extensions of the Dinur-Nissim attack, based on asking φ-weighted queries or queries choosing a subgraph uniformly at random. The novelty of our work is in the analysis, showing that this simple attack is much more powerful than was previously known, as well as pointing to possible limits of this approach and putting forth new application domains such as graph problems (which may occur in social networks, Internet graphs, etc). These results may find applications not only for breaking privacy, but also in the positive direction, for recovering complicated structure information using inaccurate estimates about its substructures.
- Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Samir Khuller, Rina Panigrahy, Dilys Thomas, and An Zhu. Achieving anonymity via clustering. ACM Transactions on Algorithms (TALG), 6, 2010. Google ScholarDigital Library
- Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. 2000. Google ScholarDigital Library
- Hai Brenner and Kobbi Nissim. Impossibility of differentially private universally optimal mechanisms. CoRR, abs/1008.0256, 2010.Google Scholar
- Moses Charikar, editor. Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010. SIAM, 2010. Google ScholarDigital Library
- Dorothy E. Denning and Peter J. Denning. Cryptography and Data Security. 1982.Google ScholarDigital Library
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay, editor, EUROCRYPT, volume 4004 of Lecture Notes in Computer Science, pages 486--503. Springer, 2006. Google ScholarDigital Library
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486--503, 2006. Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, TCC, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006. Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of LP decoding. In David S. Johnson and Uriel Feige, editors, STOC, pages 85--94. ACM, 2007. Google ScholarDigital Library
- Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 202--210. ACM, 2003. Google ScholarDigital Library
- Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In Andrew Chi-Chih Yao, editor, ICS, pages 66--80. Tsinghua University Press, 2010.Google Scholar
- Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Schulman 2010, pages 715--724. Google ScholarDigital Library
- Cynthia Dwork. Differential privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006. Google ScholarDigital Library
- Cynthia Dwork. Ask a better question, get a better answer a new approach to private data analysis. In Thomas Schwentick and Dan Suciu, editors, ICDT, volume 4353 of Lecture Notes in Computer Science, pages 18--27. Springer, 2007. Google ScholarDigital Library
- Cynthia Dwork. The differential privacy frontier (extended abstract). In Omer Reingold, editor, TCC, volume 5444 of Lecture Notes in Computer Science, pages 496--502. Springer, 2009. Google ScholarDigital Library
- Cynthia Dwork. Differential privacy in new settings. In Charikar {Cha10}, pages 174--183. Google ScholarDigital Library
- Cynthia Dwork and Sergey Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In David Wagner, editor, CRYPTO, volume 5157 of Lecture Notes in Computer Science, pages 469--480. Springer, 2008. Google ScholarDigital Library
- Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private approximation algorithms. CoRR, abs/0903.4510, 2009.Google Scholar
- Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Charikar {Cha10}, pages 1106--1125. Google ScholarDigital Library
- Anupam Gupta, Aaron Roth, and Jonathan Ullman. Iterative constructions and private data release. CoRR, abs/1107.3731, 2011.Google Scholar
- Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Schulman {Sch10}, pages 705--714. Google ScholarDigital Library
- Krishnaram Kenthapadi, Nina Mishra, and Kobbi Nissim. Simulatable auditing. Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS), 2005. Google ScholarDigital Library
- Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith, and Jonathan Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Schulman {Sch10}, pages 775--784. Google ScholarDigital Library
- Martin Merener. Polynomial-time attack on output perturbation sanitizers for real-valued datasets. Journal of Privacy and Confidentiality, 2(2):65--81, 2010.Google Scholar
- Frank McSherry and Ilya Mironov. Differentially private recommender systems: Building privacy into the netflix prize contenders. In John F. Elder IV, Françoise Fogelman-Soulié, Peter A. Flach, and Mohammed Javeed Zaki, editors, KDD, pages 627--636. ACM, 2009. Google ScholarDigital Library
- Frank McSherry and Ratul Mahajan. Differentially-private network trace analysis. In Shivkumar Kalyanaraman, Venkata N. Padmanabhan, K. K. Ramakrishnan, Rajeev Shorey, and Geoffrey M. Voelker, editors, SIGCOMM, pages 123--134. ACM, 2010. Google ScholarDigital Library
- Shubha U. Nabar, Bhaskara Marthi, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwani. Towards robustness in query auditing. VLDB '06 Proceedings of the 32nd international conference on Very large data bases, 2006. Google ScholarDigital Library
- Kobbi Nissim, Rann Smorodinsky, and Moshe Tennenholtz. Approximately optimal mechanism design via differential privacy. CoRR, abs/1004.2888, 2010.Google Scholar
- K. Neammanee and P. Thongtha. Improvement of the non-uniform version of berry-esseen inequality via paditz-siganov theorems. Journal of Inequalities in Pure and Applied Mathematics (JIPM), 8(4), 2007.Google Scholar
- Leonard J. Schulman, editor. Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5--8 June 2010. ACM, 2010.Google Scholar
- Ross M. Sheldon. Stochastic Processes. Wiley, 1996.Google Scholar
- Sergey Yekhanin. Private information retrieval. Commun. ACM, 53(4):68--73, 2010. Google ScholarDigital Library
Index Terms
The power of the dinur-nissim algorithm: breaking privacy of statistical and graph databases
Recommendations
Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction
Data perturbation and query restriction are two methods developed to protect confidential data in statistical databases. In the former, the data is systematically changed to yield answers to queries that are statistically similar to those that would ...
A polynomial-time approximation to optimal multivariate microaggregation
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying ...
Query Evaluability in Statistical Databases
The evaluability of queries on a statistical database containing joinable tables connected by an intersection hypergraph is considered. A characterization of evaluable queries is given, which allows one to define polynomial-time procedures both for ...
Comments