Skip to main content
Log in

Locally differentially private high-dimensional data synthesis

  • Research Paper
  • From CAS & CAE Members
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In local differential privacy (LDP), a challenging problem is the ability to generate high-dimensional data while efficiently capturing the correlation between attributes in a dataset. Existing solutions for low-dimensional data synthesis, which partition the privacy budget among all attributes, cease to be effective in high-dimensional scenarios due to the large-scale noise and communication cost caused by the high dimension. In fact, the high-dimensional characteristics not only bring challenges but also make it possible to apply some technologies to break this bottleneck. This paper presents SamPrivSyn for high-dimensional data synthesis under LDP, which is composed of a marginal sampling module and a data generation module. The marginal sampling module is used to sample from the original data to obtain two-way marginals. The sampling process is based on mutual information, which is updated iteratively to retain, as much as possible, the correlation between attributes. The data generation module is used to reconstruct the synthetic dataset from the sampled two-way marginals. Furthermore, this study conducted comparison experiments on the real-world datasets to demonstrate the effectiveness and efficiency of the proposed method, with results proving that SamPrivSyn can not only protect privacy but also retain the correlation information between the attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang W, Xi J, Chen H. Modeling and recognizing driver behavior based on driving data: a survey. Math Problems Eng, 2014, 2014: 1–20

    Article  Google Scholar 

  2. Preis T, Moat H S, Stanley H E. Quantifying trading behavior in financial markets using google trends. Sci Rep, 2013, 3: 1684

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Fredrikson M, Lantz E, Jha S, et al. Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX Conference on Security Symposium, 2014. 17–32

  4. Ohlhorst F J. Big Data Analytics: Turning Big Data Into Big Money. Hoboken: John Wiley & Sons, 2012

    Book  Google Scholar 

  5. Dwork C. Differential Privacy: A Survey of Results. Berlin: Springer, 2008

    Google Scholar 

  6. Duchi J C, Jordan M I, Wainwright M J. Local privacy and statistical minimax rates. In: Proceedings of IEEE 54th Annual Symposium on Foundations of Computer Science, 2013

  7. Nguyên T T, Xiao X K, Yang Y, et al. Collecting and analyzing data from smart device users with local differential privacy. 2016. ArXiv:1606.05053

  8. Wang T, Li N, Jha S. Locally differentially private heavy hitter identification. IEEE Trans Dependable Secure Comput, 2019, 18: 982–993

    Article  Google Scholar 

  9. Erlingsson ú, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014. 1054–1067

  10. Differential Privacy Team, Apple. Learning with privacy at scale. 2017. https://machinelearning.apple.com/research/learning-with-privacy-at-scale

  11. Kairouz P, Bonawitz K, Ramage D. Discrete distribution estimation under local privacy. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016. 2436–2444

  12. Bassily R, Smith A. Local, private, efficient protocols for succinct histograms. In: Proceedings of the 47th ACM Symposium on Theory of Computing, 2015. 127–135

  13. Ye M, Barg A. Optimal schemes for discrete distribution estimation under locally differential privacy. IEEE Trans Inform Theor, 2018, 64: 5662–5676

    Article  MathSciNet  Google Scholar 

  14. Xue Q, Zhu Y, Wang J. Joint distribution estimation and Naïve Bayes classification under local differential privacy. IEEE Trans Emerg Top Comput, 2021, 9: 2053–2063

    Article  Google Scholar 

  15. Duchi J C, Jordan M I, Wainwright M J. Local privacy, data processing inequalities, and statistical minimax rates. 2013. ArXiv:1302.3203

  16. Qin Z, Yang Y, Yu T, et al. Heavy hitter estimation over set-valued data with local differential privacy. In: Proceedings of ACM Sigsac Conference on Computer and Communications Security, 2016. 192–203

  17. Ren X, Yu C M, Yu W, et al. LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans Inform Forensic Secur, 2018, 13: 2151–2166

    Article  Google Scholar 

  18. Warner S L. Randomized response: a survey technique for eliminating evasive answer bias. J Am Statistical Assoc, 1965, 60: 63–69

    Article  CAS  Google Scholar 

  19. Dwork C, Roth A. The algorithmic foundations of differential privacy. FNT Theor Comput Sci, 2014, 9: 211–407

    Article  MathSciNet  Google Scholar 

  20. Li N, Lyu M, Su D, et al. Differential privacy: from theory to practice. Synthesis Lectures Inf Security Privacy Trust, 2016, 8: 1–138

    Article  Google Scholar 

  21. Mcsherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), 2007

  22. Wei J, Lin Y, Yao X, et al. Differential privacy-based genetic matching in personalized medicine. IEEE Trans Emerg Top Comput, 2021, 9: 1109–1125

    Article  Google Scholar 

  23. Kasiviswanathan S P, Lee H K, Nissim K, et al. What can we learn privately? SIAM J Comput, 2008, 40: 793–826

    Article  MathSciNet  Google Scholar 

  24. Kairouz P, Oh S, Viswanath P. Extremal Mechanisms for Local Differential Privacy. Cambridge: MIT Press, 2014

    Google Scholar 

  25. Wang T, Blocki J, Jha S K. Locally differentially private protocols for frequency estimation. In: Proceedings of the 26th USENIX Security Symposium, 2017

  26. Zhang Z, Wang T, Li N, et al. CALM: consistent adaptive local marginal for marginal release under local differential privacy. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018. 212–229

  27. Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy. In: Proceedings of IEEE 35th Annual International Conference on Data Engineering (ICDE), 2019

  28. Ye Q, Hu H, Meng X, et al. PrivKV: key-value data collection with local differential privacy. In: Proceedings of IEEE Symposium on Security and Privacy (SP), 2019. 317–331

  29. Gu X, Li M, Cheng Y, et al. PCKV: locally differentially private correlated key-value data collection with optimized utility. In: Proceedings of the 29th USENIX Security Symposium, 2020. 967–984

  30. Sun L, Zhao J, Ye X, et al. Conditional analysis for key-value data with local differential privacy. 2019. ArXiv:1907.05014

  31. Cormode G, Kulkarni T, Srivastava D. Answering range queries under local differential privacy. Proc VLDB Endow, 2019, 12: 1126–1138

    Article  Google Scholar 

  32. Wang T, Ding B, Zhou J, et al. Answering multi-dimensional analytical queries under local differential privacy. In: Proceedings of the International Conference on Management of Data, 2019. 159–176

  33. Du L, Zhang Z, Bai S, et al. AHEAD: adaptive hierarchical decomposition for range query under local differential privacy. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2021. 1266–1288

  34. Zhang Z, Wang T, Honorio J, et al. PrivSyn: differentially private data synthesis. In: Proceedings of the 30th USENIX Security Symposium, 2021

Download references

Acknowledgements

This work was supported by Strategic Research and Consulting Project of the Chinese Academy of Engineering (Grant No. 2022-XY-107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changjun Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Wang, C., Yang, Q. et al. Locally differentially private high-dimensional data synthesis. Sci. China Inf. Sci. 66, 112101 (2023). https://doi.org/10.1007/s11432-022-3583-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3583-x

Keywords

Navigation