skip to main content
10.1145/1265530.1265569acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Published: 11 June 2007 Publication History

Abstract

The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously.
Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the "nearest" consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself.
The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.

References

[1]
Special Issue on Statistical Disclosure Control, volume 14(4) of Journal of Official Statistics. 1998.
[2]
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS. ACM, 2001.
[3]
R. Agrawal and R. Srikant. Privacy-preserving data mining. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, SIGMOD Conference, pages 439--450. ACM, 2000.
[4]
R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 251--262, New York, NY, USA, 2005. ACM Press.
[5]
M. Bacharach. Matrix rounding problems. Management Science, 9:732--742, 1966.
[6]
A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In C. Li, editor, PODS, pages 128--138. ACM, 2005.
[7]
J. Castro. Quadratic interior-point methods in statistical disclosure control. Computational Management Science, 2:pages 107--121, 2005.
[8]
J. Castro. Minimum-distance controlled perturbation methods for large-scale tabular data protection. Euorpean Journal of Operantional Research, 171:39--52, 2006.
[9]
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In J. Kilian, editor, TCC, volume 3378 of Lecture Notes in Computer Science, pages 363--385. Springer, 2005.
[10]
S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. In Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), Arlington, Virginia, 2005. AUAI Press.
[11]
L. Cox, J. Kelly, and R. Patil. Balancing quality and confidentiality in multivariate tabular data. Privacy in Statistical Databases, 3080:87--98, 2004.
[12]
T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk. tidskrift, 3:213--225, 1977.
[13]
R. A. Dandekar and L. Cox. Synthetic tabular data: An alternative to complementary cell suppression, 2002. manuscript, energy Information Administration, US Department of Energy.
[14]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In Milo {26}, pages 202--210.
[15]
A. Dobra and S. Fienberg. Bounding entries in multi-way contingency tables given a set of marginal totals, 2002. Proceedings of Conference on Foundation of Statistical Inference and its Applicaitons.
[16]
J. Domingo-Ferrer and V. Torra. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):545--556, 2002.
[17]
G. Duncan. Confidentiality and statistical disclosure limitation. In N. Smelser and P. Baltes, editors, International Encyclopedia of the Social and Behavioral Sciences. Elsevier, 2001.
[18]
C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006.
[19]
C. Dwork, D. Lee, and F. McSherry. Privacy preserving histogram case study, 2007. Manuscript.
[20]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, TCC, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006.
[21]
C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of LP decoding. In Proceedings of the 39th annual Symposium on the Theory of Computation., 2007.
[22]
C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In M. K. Franklin, editor, CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528--544. Springer, 2004.
[23]
A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Milo {26}, pages 211--222.
[24]
I. Fellegi. On the question of statistical confidentiality. Journal of the American Statistical Association, pages 7--18, 1972.
[25]
M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization (Algorithms and Combinatorics). Springer, December 1994.
[26]
T. Milo, editor. Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9--12, 2003, San Diego, CA, USA. ACM, 2003.
[27]
J. Kelly, A. Assad, and B. Golden. The controlled rounding problem: Relaxations and complexity issues. OR Spektrum, 12:129--138, 1990.
[28]
T. W. Körner. Fourier Analysis. Cambridge University Press, Cambridge, UK, 1988.
[29]
J. A. D. Loera and S. Onn. All rational polytopes are transportation polytopes and all polytopal integer sets are contingency tables. Proceedings of the 10th Ann. Math. Prog. Soc. Symp. Integ. Prog. Combin. Optim., LNCS, 3064:338--351, 2004.
[30]
J. A. D. Loera and S. Onn. The complexity of three-way statistical tables. SIAM J. Comput., 33:819--836, 2004.
[31]
J. A. D. Loera and S. Onn. Markov bases of three-way tables are arbitrarily complicated. J. Symb. Comput., 41(2):173--181, 2006.
[32]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In L. Liu, A. Reuter, K. -Y. Whang, and J. Zhang, editors, ICDE, page 24. IEEE Computer Society, 2006.
[33]
D. A. Robertson and R. Ethier. Cell suppression: Experience and theory. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 8--20. Springer, 2002.
[34]
D. Rubin. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9:461--469, 1993.
[35]
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression.
[36]
L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571--588, 2002.
[37]
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002.

Cited By

View all
  • (2025)Differentially private histogram with valid statisticsStatistics & Probability Letters10.1016/j.spl.2024.110354219(110354)Online publication date: Apr-2025
  • (2024)Perturb-and-projectProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692435(9161-9179)Online publication date: 21-Jul-2024
  • (2024)A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability FirstAlgorithms10.3390/a1707029317:7(293)Online publication date: 4-Jul-2024
  • Show More Cited By

Index Terms

  1. Privacy, accuracy, and consistency too: a holistic solution to contingency table release

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2007
    328 pages
    ISBN:9781595936851
    DOI:10.1145/1265530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. OLAP
    2. contingency table
    3. privacy

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS07
    Sponsor:

    Acceptance Rates

    PODS '07 Paper Acceptance Rate 28 of 187 submissions, 15%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)83
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Differentially private histogram with valid statisticsStatistics & Probability Letters10.1016/j.spl.2024.110354219(110354)Online publication date: Apr-2025
    • (2024)Perturb-and-projectProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692435(9161-9179)Online publication date: 21-Jul-2024
    • (2024)A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability FirstAlgorithms10.3390/a1707029317:7(293)Online publication date: 4-Jul-2024
    • (2024)Differentially Private Data Generation with Missing DataProceedings of the VLDB Endowment10.14778/3659437.365945517:8(2022-2035)Online publication date: 31-May-2024
    • (2024)30 Years of Synthetic DataStatistical Science10.1214/24-STS92739:2Online publication date: 1-May-2024
    • (2024)Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign KeysACM Transactions on Database Systems10.1145/369783149:4(1-40)Online publication date: 26-Sep-2024
    • (2024)Continual Release of Differentially Private Synthetic Data from Longitudinal Data CollectionsProceedings of the ACM on Management of Data10.1145/36515952:2(1-26)Online publication date: 14-May-2024
    • (2024)Tabular Data Synthesis with GANs for Adaptive AI ModelsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632438(242-246)Online publication date: 4-Jan-2024
    • (2024)Collection and Analysis of Sensitive Data with Privacy Protection by a Distributed Randomized Response ProtocolProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636024(1415-1424)Online publication date: 8-Apr-2024
    • (2024)SenseHash: Computing on Sensor Values Mystified at the OriginIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.321748812:2(508-520)Online publication date: Apr-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media