skip to main content
10.1145/3491371.3491385acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnsyssConference Proceedingsconference-collections
invited-talk

MGD: A Utility Metric for Private Data Publication

Published: 21 December 2021 Publication History

Abstract

Differential privacy has been accepted as one of the most popular techniques to protect user data privacy. A common way for utilizing private data under DP is to take an input dataset and synthesize a new dataset that preserves features of the input dataset while satisfying DP. A trade-off always exists between the strength of privacy protection and the utility of the final output: stronger privacy protection requires larger randomness, so the outputs usually have a larger variance and can be far from optimal. In this paper, we summarize our proposed metric for the NIST “A Better Meter Stick for Differential Privacy” competition [26], MarGinal Difference (MGD), for measuring the utility of a synthesized dataset. Our metric is based on earth mover distance. We introduce new features in our metric so that it is not affected by some small random noise that is unavoidable in the DP context but focuses more on the significant difference. We show that our metric can reflect the range query error better compared with other existing metrics. We introduce an efficient computation method based on the min-cost flow to alleviate the high computation cost of the earth mover’s distance.

References

[1]
Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 308–318.
[2]
John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2867–2867.
[3]
Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice hall.
[4]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 214–223.
[5]
Avrim Blum, Katrina Ligett, and Aaron Roth. 2008. A learning theory approach to non-interactive database privacy. In STOC. 609–618.
[6]
Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. 2021. Data Synthesis via Differentially Private Markov Random Fields. Proceedings of the VLDB Endowment 13 (2021).
[7]
Scott Cohen and L Guibasm. 1999. The earth mover’s distance under transformation sets. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Vol. 2. IEEE, 1076–1083.
[8]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In Advances in Neural Information Processing Systems. 3574–3583.
[9]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In TCC. 265–284.
[10]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. ACM, 1054–1067.
[11]
Facebook. [n.d.]. Opacus. https://opacus.ai/.
[12]
Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. 2016. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies (PoPETS) issue 3, 2016 (2016).
[13]
Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Zhiwei Steven Wu. 2014. Dual Query: Practical private query release for high dimensional data. In International Conference on Machine Learning. 1170–1178.
[14]
Google. [n.d.]. TensorFlow Privacy. https://github.com/tensorflow/privacy.
[15]
Kristen Grauman and Trevor Darrell. 2004. Fast contour matching using approximate earth mover’s distance. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 1. IEEE, I–I.
[16]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767–5777.
[17]
Moritz Hardt, Katrina Ligett, and Frank McSherry. 2012. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems. 2339–2347.
[18]
Noah Johnson, Joseph P Near, and Dawn Song. 2018. Towards practical differential privacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), 526–539.
[19]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.
[20]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International conference on machine learning. 957–966.
[21]
California State Legislature. [n.d.]. California Consumer Privacy Act of 2018. https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3.&part=4.&lawCode=CIV&title=1.81.5.
[22]
Haibin Ling and Kazunori Okada. 2007. An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29, 5(2007), 840–853.
[23]
Ryan McKenna, Siddhan Pradhan, Daniel Sheldon, and Gerome Miklau. 2021. Relaxed Marginal Consistency for Differentially Private Query Answering. arXiv (2021).
[24]
Ryan McKenna, Daniel Sheldon, and Gerome Miklau. 2019. Graphical-model based estimation and inference for differential privacy. In International Conference on Machine Learning. PMLR, 4435–4444.
[25]
NIST. [n.d.]. 2018 Differential Privacy Synthetic Data Challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic.
[26]
NIST. [n.d.]. DeID2 - A Better Meter Stick for Differential Privacy. https://www.herox.com/bettermeterstick/teams.
[27]
NIST. [n.d.]. Differential Privacy Temporal Map Challenge: Sprint 1. https://www.drivendata.org/competitions/69/deid2-sprint-1-prescreened/page/263/.
[28]
Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, and Parvez Ahammad. 2020. LinkedIn’s Audience Engagements API: A privacy preserving data analytics system at scale. arXiv preprint arXiv:2002.05839(2020).
[29]
Paul Voigt and Axel Von dem Bussche. 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing 10 (2017), 3152676.
[30]
Fan Wang and Leonidas J Guibas. 2012. Supervised earth mover’s distance learning and its computer vision applications. In European Conference on Computer Vision. Springer, 442–455.
[31]
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey F. Naughton. 2017. Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD). 1307–1322.
[32]
Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 25.
[33]
Zhikun Zhang, Tianhao Wang, Ninghui Li, Jean Honorio, Michael Backes, Shibo He, Jiming Chen, and Yang Zhang. 2021. Privsyn: Differentially private data synthesis. In 30th {USENIX} Security Symposium ({USENIX} Security 21).

Cited By

View all
  • (2023)Rectification of Syntactic and Semantic Privacy MechanismsIEEE Security and Privacy10.1109/MSEC.2022.318836521:5(18-32)Online publication date: 1-Sep-2023
  • (2023)A Survey on Privacy Preserving Synthetic Data Generation and a Discussion on a Privacy-Utility Trade-off ProblemScience of Cyber Security - SciSec 2022 Workshops10.1007/978-981-19-7769-5_13(167-180)Online publication date: 1-Jan-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
NSysS '21: Proceedings of the 8th International Conference on Networking, Systems and Security
December 2021
138 pages
ISBN:9781450387378
DOI:10.1145/3491371
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2021

Check for updates

Author Tags

  1. Privacy
  2. data synthesis
  3. metric

Qualifiers

  • Invited-talk
  • Research
  • Refereed limited

Funding Sources

  • United States National Science Foundation

Conference

8th NSysS 2021

Acceptance Rates

Overall Acceptance Rate 12 of 44 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Rectification of Syntactic and Semantic Privacy MechanismsIEEE Security and Privacy10.1109/MSEC.2022.318836521:5(18-32)Online publication date: 1-Sep-2023
  • (2023)A Survey on Privacy Preserving Synthetic Data Generation and a Discussion on a Privacy-Utility Trade-off ProblemScience of Cyber Security - SciSec 2022 Workshops10.1007/978-981-19-7769-5_13(167-180)Online publication date: 1-Jan-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media