Skip to main content

PrivGMM: Probability Density Estimation with Local Differential Privacy

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12112))

Included in the following conference series:

  • 2797 Accesses

Abstract

Probability density estimation is a fundamental task in data analysis that can estimate the unobservable underlying probability density function from the observed data. However, the data used for density estimation may contain sensitive information, and the public of original data will compromise individuals’ privacy. To address this problem, we in this paper propose a private parametric probability density estimation mechanism, called PrivGMM. It provides strong privacy guarantees locally (e.g., on personal computers or mobile phones) and efficiently (i.e., computation cost is small) for users. Meanwhile, it provides an accurate estimation of parameters of the probability density model for data collectors. Specifically, in a local setting, each user adds noise to his/her original data, given the constraint of local differential privacy. On the server side, we employ the Gaussian Mixture Model, which is a popular model to approximate distributions. To reduce the effect of noise, we formulate the parametric estimation problem with a multi-layer latent variables structure, and utilize Expectation-Maximization algorithm to solve the Gaussian Mixture Model. Experiments in real datasets validate that our mechanism outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhaskara, A., Charikar, M., Moitra, A., Vijayaraghavan, A.: Smoothed analysis of tensor decompositions. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 594–603. ACM (2014)

    Google Scholar 

  2. Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)

    Article  MathSciNet  Google Scholar 

  3. Blömer, J., Bujna, K.: Adaptive seeding for gaussian mixture models. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 296–308. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_24

    Chapter  Google Scholar 

  4. Chaudhuri, K., Rao, S.: Beyond Gaussians: spectral methods for learning mixtures of heavy-tailed distributions. In: COLT, vol. 4, p. 1 (2008)

    Google Scholar 

  5. Dempster, A.P.: Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 39, 1–38 (1977)

    MATH  Google Scholar 

  6. Ding, B., Kulkarni, J., Yekhanin, S.: Collecting telemetry data privately. In: Advances in Neural Information Processing Systems, pp. 3571–3580 (2017)

    Google Scholar 

  7. Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. IEEE (2013)

    Google Scholar 

  8. Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Minimax optimal procedures for locally private estimation. J. Am. Stat. Assoc. 113(521), 182–201 (2018)

    Article  MathSciNet  Google Scholar 

  9. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_29

    Chapter  Google Scholar 

  10. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  11. Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)

    Google Scholar 

  12. Greenberg, A.: Apple’s differential privacy is about collecting your data-but not your data. Wired, June 13 (2016)

    Google Scholar 

  13. Jia, J., Gong, N.Z.: Calibrate: frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2008–2016. IEEE (2019)

    Google Scholar 

  14. Joseph, M., Kulkarni, J., Mao, J., Wu, S.Z.: Locally private Gaussian estimation. In: Advances in Neural Information Processing Systems, pp. 2980–2989 (2019)

    Google Scholar 

  15. Joseph, M., Kulkarni, J., Mao, J., Wu, Z.S.: Locally private Gaussian estimation. arXiv preprint arXiv:1811.08382 (2018)

  16. Kamath, G., Li, J., Singhal, V., Ullman, J.: Privately learning high-dimensional distributions. arXiv preprint arXiv:1805.00216 (2018)

  17. Kamath, G., Sheffet, O., Singhal, V., Ullman, J.: Differentially private algorithms for learning mixtures of separated Gaussians. In: Advances in Neural Information Processing Systems, pp. 168–180 (2019)

    Google Scholar 

  18. Karwa, V., Vadhan, S.: Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908 (2017)

  19. Kothari, P.K., Steinhardt, J., Steurer, D.: Robust moment estimation and improved clustering via sum of squares. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1035–1046. ACM (2018)

    Google Scholar 

  20. Kumar, A., Kannan, R.: Clustering with spectral norm and the k-means algorithm. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 299–308. IEEE (2010)

    Google Scholar 

  21. Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 93–102. IEEE (2010)

    Google Scholar 

  22. Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)

    Google Scholar 

  23. Scott, D.W.: Parametric statistical modeling by minimum integrated square error. Technometrics 43(3), 274–285 (2001)

    Article  MathSciNet  Google Scholar 

  24. Wu, Y., Wu, Y., Peng, H., Zeng, J., Chen, H., Li, C.: Differentially private density estimation via Gaussian mixtures model. In: 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), pp. 1–6. IEEE (2016)

    Google Scholar 

  25. Zhao, J., et al.: Reviewing and improving the Gaussian mechanism for differential privacy. arXiv preprint arXiv:1911.12060 (2019)

Download references

Acknowledgments

This work was supported by the Anhui Initiative in Quantum Information Technologies (No. AHY150300). Wei Yang and Shaowei Wang are both corresponding authors.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Yang or Shaowei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diao, X., Yang, W., Wang, S., Huang, L., Xu, Y. (2020). PrivGMM: Probability Density Estimation with Local Differential Privacy. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_7

Download citation

Publish with us

Policies and ethics