skip to main content
research-article

Information Measures in Statistical Privacy and Data Processing Applications

Published: 01 June 2015 Publication History

Abstract

In statistical privacy, utility refers to two concepts: information preservation, how much statistical information is retained by a sanitizing algorithm, and usability, how (and with how much difficulty) one extracts this information to build statistical models, answer queries, and so forth. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation, and, afterward, the data consumers process the sanitized output according to their various individual needs [Ghosh et al. 2009; Williams and McSherry 2010].
We analyze the information-preserving properties of utility measures with a combination of two new and three existing utility axioms and study how violations of an axiom can be fixed. We show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy all of the axioms. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability—if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn’t Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic postprocessing algorithm empirically outperforms previously proposed approaches.

Supplementary Material

a28-lin-apndx.pdf (lin.zip)
Supplemental movie, appendix, image and software files for, Information Measures in Statistical Privacy and Data Processing Applications

References

[1]
John M. Abowd and Simon D. Woodcock. 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (2001), 215--277.
[2]
Charu C. Aggarwal. 2008. On unifying privacy and uncertain data models. In ICDE.
[3]
Mário S. Alvim, Miguel E. Andrés, Konstantinos Chatzikokolakis, Pierpaolo Degano, and Catuscia Palamidessi. 2011b. Differential privacy: On the trade-off between utility and information leakage. http://arxiv.org/abs/1103.5188. (2011).
[4]
Mário S. Alvim, Miguel E. Andrés, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. 2011a. On the relation between differential privacy and quantitative information flow. In ICALP.
[5]
Mario S. Alvim, Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith. 2012. Measuring information leakage using generalized gain functions. In CSF.
[6]
Mina Askari, Reihaneh Safavi-Naini, and Ken Barker. 2012. An information theoretic privacy and utility measure for data sanitization mechanisms. In CODASPY.
[7]
Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar. 2007. Privacy, accuracy and consistency too: A holistic solution to contingency table release. In PODS.
[8]
Richard E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk. 1972. Statistical Inference Under Order Restrictions. John Wiley and Sons.
[9]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
[10]
Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. 2005. Practical privacy: The SuLQ framework. In PODS.
[11]
Avrim Blum, Katrina Ligett, and Aaron Roth. 2008. A learning theory approach to non-interactive database privacy. In STOC.
[12]
Hai Brenner and Kobbi Nissim. 2010. Impossibility of differentially private universally optimal mechanisms. In FOCS.
[13]
Bee-Chung Chen, Daniel Kifer, Kristen LeFevre, and Ashwin Machanavajjhala. 2009. Privacy-preserving data publishing. Foundations and Trends in Databases 2, 1--2 (2009), 1--167.
[14]
Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. Wiley-Interscience.
[15]
Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. 2011. Differentially private data cubes: Optimizing noise sources and consistency. In SIGMOD.
[16]
Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In PODS.
[17]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC.
[18]
Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil Vadhan. 2009. On the complexity of differentially private data release: Efficient algorithms and hardness results. In STOC.
[19]
Bruce Ebanks, Prasanna Sahoo, and Wolfgang Sander. 1997. Characterizations of Information Measures. World Scientific Publishing Co.
[20]
Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. 2002. Privacy preserving mining of association rules. In KDD.
[21]
Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2003. Bayesian Data Analysis (2nd ed.). Chapman & Hall/CRC.
[22]
Arpita Ghosh and Aaron Roth. 2011. Selling privacy at auction. In EC.
[23]
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In STOC.
[24]
Mangesh Gupte and Mukund Sundararajan. 2010. Universally optimal privacy mechanisms for minimax agents. In PODS.
[25]
Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. In STOC.
[26]
Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2010. Boosting the accuracy of differentially-private histograms through consistency. In VLDB.
[27]
Ali Inan, Murat Kantarcioglu, and Elisa Bertino. 2009. Using anonymized data for classification. In ICDE.
[28]
Vijay S. Iyengar. 2002. Transforming data to satisfy privacy constraints. In KDD.
[29]
Daniel Kifer and Bing-Rong Lin. 2010. Towards an axiomatization of statistical privacy and utility. In PODS.
[30]
Daniel Kifer and Bing-Rong Lin. 2012. An axiomatic view of statistical privacy and utility. Journal of Privacy and Confidentiality 4, 1 (2012).
[31]
Chao Li, Michael Hay, Vibhor Rastogi, Gerome Miklau, and Andrew McGregor. 2010. Optimizing linear counting queries under differential privacy. In PODS.
[32]
Chao Li, Daniel Yang Li, Gerome Miklau, and Dan Suciu. 2013. A theory of pricing private data. In ICDT.
[33]
Bing-Rong Lin and Daniel Kifer. 2013. Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. In SIGMOD.
[34]
Roderick J. A. Little. 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 2 (1993), 407--426.
[35]
Annabelle McIver, Carroll Morgan, Geoffrey Smith, Barbara Espinoza, and Larissa Meinicke. 2014. Abstract channels and their robust information-leakage ordering. In POST.
[36]
Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In FOCS.
[37]
Roger B. Myerson. 1986. Axiomatic Foundations of Bayesian Decision Theory. Technical Report 671. Northwestern University Center for Mathematical Studies in Economics and Management Science. http://www.kellogg.northwestern.edu/research/math/papers/671.pdf.
[38]
J. B. Paris. 1994. The Uncertain Reasoner’s Companion. Cambridge University Press.
[39]
Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2012. A workflow for differentially-private graph synthesis. In SIGCOMM Workshop on Online Social Networks.
[40]
T. E. Raghunathan, J. P. Reiter, and D. B. Rubin. 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (2003), 1--16.
[41]
Vibhor Rastogi, Dan Suciu, and Sungho Hong. 2007. The boundary between privacy and utility in data publishing. In VLDB.
[42]
Jerome P. Reiter. 2005. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (2005), 441--462.
[43]
Tim Robertson and Paul Waltman. 1968. On estimating monotone parameters. Annals of Mathematics and Statistics 39, 3 (1968), 1030--1039.
[44]
Tim Robertson and F. T. Wright. 1980. Algorithms in order restricted statistical inference and the cauchy mean value property. Annals of Statistics 8, 3 (1980), 645--651.
[45]
Leonard J. Savage. 1972. The Foundations of Statistics. Dover.
[46]
Maurice Sion. 1958. On general minimax theorems. Pacific Journal of Mathematics 8 (1958), 171--176.
[47]
John von Neumann and Oskar Morgenstern. 2007. Theory of Games and Economic Behavior. Princeton University Press.
[48]
Oliver Williams and Frank McSherry. 2010. Probabilistic inference and differential privacy. In NIPS.
[49]
Raymond Wong, Ada Fu, Ke Wang, and Jian Pei. 2007. Minimality attack in privacy preserving data publishing. In VLDB.
[50]
Xiaokui Xiao and Yufei Tao. 2006. Anatomy: Simple and effective privacy preservation. In VLDB.
[51]
Qing Zhang, Nick Koudas, Divesh Srivastava, and Ting Yu. 2007. Aggregate query answering on anonymized tables. In ICDE.

Cited By

View all

Index Terms

  1. Information Measures in Statistical Privacy and Data Processing Applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 4
    June 2015
    261 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2786971
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2015
    Accepted: 01 November 2014
    Revised: 01 August 2014
    Received: 01 March 2014
    Published in TKDD Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Privacy
    2. decision theory
    3. differential privacy
    4. information measures
    5. minimax
    6. utility

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF
    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Balancing Privacy-Utility of Differential Privacy MechanismSecurity and Communication Networks10.1155/2021/55921912021Online publication date: 1-Jan-2021
    • (2021)Risk-Aware Individual Trajectory Data Publishing With Differential PrivacyIEEE Access10.1109/ACCESS.2020.30483949(7421-7438)Online publication date: 2021
    • (2020)A Survey on Privacy Properties for Data Publishing of Relational DataIEEE Access10.1109/ACCESS.2020.29802358(51071-51099)Online publication date: 2020
    • (2018)Privacy-Utility Trade-Off of K-Subset Mechanism2018 International Conference on Networking and Network Applications (NaNA)10.1109/NANA.2018.8648741(212-217)Online publication date: Oct-2018
    • (2018)An axiomatization of information flow measuresTheoretical Computer Science10.1016/j.tcs.2018.10.016Online publication date: Oct-2018
    • (2017)Game theory based privacy preserving analysis in correlated data publicationProceedings of the Australasian Computer Science Week Multiconference10.1145/3014812.3014887(1-10)Online publication date: 30-Jan-2017
    • (2017)Game Theory Based Correlated Privacy Preserving Analysis in Big DataIEEE Transactions on Big Data10.1109/TBDATA.2017.2701817(1-1)Online publication date: 2017
    • (2017)Smart Meter Data Privacy: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2017.272019519:4(2820-2835)Online publication date: Dec-2018
    • (2016)Axioms for Information Leakage2016 IEEE 29th Computer Security Foundations Symposium (CSF)10.1109/CSF.2016.13(77-92)Online publication date: Jun-2016
    • (2015)Privacy and the Price of DataProceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2015.10(16-16)Online publication date: 6-Jul-2015

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media