skip to main content
10.1145/3488932.3522771acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
keynote
Public Access

Differentially Private Data Synthesis: State of the Art and Challenges

Published: 30 May 2022 Publication History

Abstract

Differential privacy has been accepted as the de facto notion for protecting privacy. Companies and government agencies use differential privacy for privacy-preserving data analysis. For example, the US census bureau applied differential privacy in the 2020 census. One important approach to use a private dataset is to generate a synthetic dataset that is similar to the private dataset in a way that satisfies differential privacy. This enables data analysts to directly apply existing algorithms for performing data analysis. Furthermore, as additional data analysis tasks performed on the published dataset are post-processing, they do not incur additional privacy cost. In recent years, US National Institutes of Standards and Technology ran two competitions on differentially private data synthesis, which drove the development of practically effective data synthesis algorithms.
In this talk, I will discuss the current state of the art for private data synthesis, with a focus on those approaches that have performed well in the NIST competitions. One family of approaches uses probabilistic graphical models. My group's approach uses private marginals and a procedure that is similar to Iterative Proportional Fitting, which has been studied in many fields. We also discuss the remaining challenge and open questions.

References

[1]
John M Abowd. The US census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2867--2867, 2018.
[2]
Kuntai Cai, Xiaoyu Lei, Jianxin Wei, and Xiaokui Xiao. Data synthesis via differentially private markov random field. Proc. VLDB Endow., 14(11):2190--2202, 2021.
[3]
W. Edwards Deming and Frederick F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist., 11(4):427--444, December 1940.
[4]
Cynthia Dwork. Differential privacy. In ICALP, pages 1--12, 2006.
[5]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006.
[6]
Ryan Mckenna, Daniel Sheldon, and Gerome Miklau. Graphical-model based estimation and inference for differential privacy. In International Conference on Machine Learning, pages 4435--4444, 2019.
[7]
NIST. 2018 differential privacy synthetic data challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic.
[8]
NIST. 2020 differential privacy temporal map challenge. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2020-differential-privacy-temporal.
[9]
Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):25, 2017.
[10]
Zhikun Zhang, Tianhao Wang, Ninghui Li, Jean Honorio, Michael Backes, Shibo He, Jiming Chen, and Yang Zhang. Privsyn: Differentially private data synthesis. In Michael Bailey and Rachel Greenstadt, editors, 30th USENIX Security Symposium, USENIX Security 2021, August 11--13, 2021, pages 929--946. USENIX Association, 2021.

Index Terms

  1. Differentially Private Data Synthesis: State of the Art and Challenges

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security
    May 2022
    1291 pages
    ISBN:9781450391405
    DOI:10.1145/3488932
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2022

    Check for updates

    Author Tags

    1. differential privacy
    2. microdata publishing
    3. synthetic data

    Qualifiers

    • Keynote

    Funding Sources

    Conference

    ASIA CCS '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 418 of 2,322 submissions, 18%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 389
      Total Downloads
    • Downloads (Last 12 months)111
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media