skip to main content
10.1145/1989323.1989347acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Differentially private data cubes: optimizing noise sources and consistency

Published: 12 June 2011 Publication History

Abstract

Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)^2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

References

[1]
www.cs.cmu.edu/~compthink/mindswaps/oct07/difpriv.ppt. 2007.
[2]
N. R. Adam and J. C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4):515--556, 1989.
[3]
R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD, pages 251--262, 2005.
[4]
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273--282, 2007.
[5]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, pages 503--512, 2010.
[6]
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008.
[7]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004.
[8]
K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.
[9]
D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge Univ. Press, 2009.
[10]
C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008.
[11]
C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009.
[12]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006.
[13]
D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In STOC, pages 361--370, 2009.
[14]
A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, pages 493--502, 2010.
[15]
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv., 42(4), 2010.
[16]
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008.
[17]
A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, pages 351--360, 2009.
[18]
M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Publishing search logs - a comparative study of privacy guarantees. TKDE, 2011.
[19]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private queries through consistency. In PVLDB, pages 1021--1032, 2010.
[20]
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540, 2008.
[21]
D. Kifer. Attacks on privacy and de Finetti's theorem. In SIGMOD, pages 127--138, 2009.
[22]
A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009.
[23]
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing histogram queries under differential privacy. In PODS, pages 123--134, 2010.
[24]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115, 2007.
[25]
X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In VLDB, pages 528--539, 2004.
[26]
A. Machanavajjhala, J. Gehrke, D. Kifer, andM. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, page~24, 2006.
[27]
A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286, 2008.
[28]
F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, pages 19--30, 2009.
[29]
F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the Netflix prize contenders. In KDD, pages 627--636, 2009.
[30]
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84, 2007.
[31]
V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, pages 735--746, 2010.
[32]
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In PODS, page 188, 1998.
[33]
S. D. Silvey. Statistical Inference. Chapman-Hall, 1975.
[34]
L. Wang, S. Jajodia, and D. Wijesekera. Preserving privacy in on-line analytical processing data cubes. In Secure Data Management in Decentralized Systems, pages 355--380. 2007.
[35]
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554, 2007.
[36]
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, pages 225--236, 2010.

Cited By

View all
  • (2024)SLIM-View: Sampling and Private Publishing of Multidimensional DatabasesProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653275(391-402)Online publication date: 19-Jun-2024
  • (2024)Privacy-Preserving Traffic Flow Release with Consistency Constraints2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00138(1699-1711)Online publication date: 13-May-2024
  • (2023)An optimal and scalable matrix mechanism for noisy marginals under convex loss functionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667023(20495-20539)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OLAP
  2. data cube
  3. differential privacy
  4. private data analysis

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)5
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SLIM-View: Sampling and Private Publishing of Multidimensional DatabasesProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653275(391-402)Online publication date: 19-Jun-2024
  • (2024)Privacy-Preserving Traffic Flow Release with Consistency Constraints2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00138(1699-1711)Online publication date: 13-May-2024
  • (2023)An optimal and scalable matrix mechanism for noisy marginals under convex loss functionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667023(20495-20539)Online publication date: 10-Dec-2023
  • (2023)FederatedScope: A Flexible Federated Learning Platform for HeterogeneityProceedings of the VLDB Endowment10.14778/3579075.357908116:5(1059-1072)Online publication date: 1-Jan-2023
  • (2023)Multi-Analyst Differential Privacy for Online Query AnsweringProceedings of the VLDB Endowment10.14778/3574245.357426516:4(816-828)Online publication date: 21-Feb-2023
  • (2023)Differentially Private Data Release over Multiple TablesProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588665(207-219)Online publication date: 18-Jun-2023
  • (2023)Differential Privacy for Government Agencies—Are We There Yet?Journal of the American Statistical Association10.1080/01621459.2022.2161385118:541(761-773)Online publication date: 5-Apr-2023
  • (2023)BROOK Dataset: A Playground for Exploiting Data-Driven Techniques in Human-Vehicle Interactive DesignsHCI in Mobility, Transport, and Automotive Systems10.1007/978-3-031-35908-8_14(191-209)Online publication date: 9-Jul-2023
  • (2023)Characterizing and Optimizing Differentially-Private Techniques for High-Utility, Privacy-Preserving Internet-of-VehiclesHCI in Mobility, Transport, and Automotive Systems10.1007/978-3-031-35678-0_3(31-50)Online publication date: 9-Jul-2023
  • (2022)AIMProceedings of the VLDB Endowment10.14778/3551793.355181715:11(2599-2612)Online publication date: 1-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media