Abstract
A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries. This work considers methods for producing synthetic data under differential privacy and investigates what makes a set of queries βeasyβ or βhardβ to answer. We consider this issue in the particular case of answering sets of linear counting queries using the matrix mechanism (Li et al. 2010), a recent differentially-private mechanism that can reduce error by adding complex correlated noise adapted to a specified workload. Our main result is a novel lower bound on the minimum total error required to simultaneously release answers to a set of workload queries when using the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. Under (π, Ξ΄)-differential privacy, we prove that this bound is tight for many common workloads such as the set of all predicate queries and the set of all k-way marginals. Our empirical study also indicates this bound is close-to-tight on workloads consisting of random interval queries or random marginals.
Similar content being viewed by others
Notes
This can also be done in an interactive (or online) setting, in which the workload queries are not known in advance, but our focus is the non-interactive (or batch) setting.
Although there is no benefit in sensitivity when the workload W is used as the strategy, there are in fact some workloads for which this special instance of the matrix mechanism has lower error than the Gaussian mechanism. This happens because the matrix mechanism is able to combine related query answers from the strategy to form more accurate answers to the workload queries.
References
Γcs, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: ICDM, pp. 1β10 (2012)
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In: PODS (2007)
Ben-Israel, A., Greville, T.: Generalized inverses: Theory and applications, vol. 15. Springer (2003)
Bhaskara, A., Dadush, D., Krishnaswamy, R., Talwar, K.: Unconditional differentially private mechanisms for linear queries. In: STOC, pp. 1269β1284, New York, NY, USA (2012)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609β618 (2008)
Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. ICDE, 20β31 (2012)
Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD, pp. 217β228 (2011)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: PODS, pp. 202β210 (2003)
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: Privacy via distributed noise generation. In: EUROCRYPT, pp. 486β503 (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265β284 (2006)
Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381β390 (2009)
Dwork, C., Rothblum, G.N., Vadhan, S.P.: Boosting and differential privacy. In: FOCS, pp. 51β60 (2010)
Fawaz, N., Muthukrishnan, S., Nikolov, A.: Nearly optimal private convolution. In: ESA, pp. 445β456 (2013)
Fulton, W.: Eigenvalues, invariant factors, highest weights, and schubert calculus. Bulletin of the AMS 37(3), 209β250 (2000)
Gray, R.M.: Toeplitz and circulant matrices: A review. Now Pub (2006)
Gupta, A., Roth, A., Ullman, J.: Iterative constructions and private data release. In: TCC, pp. 339β356 (2012)
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. In: NIPS, pp. 2348β2356 (2012)
Hardt, M., Rothblum, G.: A multiplicative weights mechanism for privacy-preserving data analysis. In: FOCS, pp. 61β70 (2010)
Hardt, M., Talwar, K.: On the geometry of differential privacy. In: STOC, pp. 705β714 (2010)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially-private histograms through consistency. PVLDB 3(1-2), 1021β1032 (2010)
Kasiviswanathan, S., Rudelson, M., Smith, A., Ullman, J.: The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In: STOC, pp. 775β784 (2010)
Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123β134 (2010)
Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. PVLDB 5(6), 514β525 (2012)
Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: WPES, pp. 177β182 (2011)
McSherry, F., Mironov, I.: Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contenders. In: SIGKDD, pp. 627β636 (2009)
Nikolov, A., Talwar, K., Zhang, L.: The geometry of differential privacy: the sparse and approximate cases. In: Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, STOC β13, pp. 351β360 (2013)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: STOC, pp. 75β84 (2007)
Rastogi, V., Suciu, D., Hong, S.: The boundary between privacy and utility in data publishing. In: VLDB, pp. 531β542 (2007)
Roth, A., Roughgarden, T.: Interactive privacy via the median mechanism. In: STOC, pp. 765β774 (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: ICDE, pp. 225β236 (2010)
Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: SDM, pp. 150β168 (2010)
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. The VLDB Journal, 1β26 (2013)
Yaroslavtsev, G., Cormode, G., Procopiuc, C. M., Srivastava, D.: Accurate and efficient private release of datacubes and contingency tables. In: ICDE (2013)
Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: Optimizing batch queries under differential privacy. PVLDB 5(11), 1136β1147 (2012)
Acknowledgments
We appreciate the thoughtful comments of the anonymous reviewers. Li was supported by NSF CNS-1012748. Miklau was partially supported by NSF CNS-1012748, NSF CNS-0964094, NSF CNS-1409143, and the European Research Council under the Webdam grant, No. 226513.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, C., Miklau, G. Lower Bounds on the Error of Query Sets Under the Differentially-Private Matrix Mechanism. Theory Comput Syst 57, 1159β1201 (2015). https://doi.org/10.1007/s00224-015-9610-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-015-9610-z