Abstract
Probability density estimation is a fundamental task in data analysis that can estimate the unobservable underlying probability density function from the observed data. However, the data used for density estimation may contain sensitive information, and the public of original data will compromise individuals’ privacy. To address this problem, we in this paper propose a private parametric probability density estimation mechanism, called PrivGMM. It provides strong privacy guarantees locally (e.g., on personal computers or mobile phones) and efficiently (i.e., computation cost is small) for users. Meanwhile, it provides an accurate estimation of parameters of the probability density model for data collectors. Specifically, in a local setting, each user adds noise to his/her original data, given the constraint of local differential privacy. On the server side, we employ the Gaussian Mixture Model, which is a popular model to approximate distributions. To reduce the effect of noise, we formulate the parametric estimation problem with a multi-layer latent variables structure, and utilize Expectation-Maximization algorithm to solve the Gaussian Mixture Model. Experiments in real datasets validate that our mechanism outperforms the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhaskara, A., Charikar, M., Moitra, A., Vijayaraghavan, A.: Smoothed analysis of tensor decompositions. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 594–603. ACM (2014)
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)
Blömer, J., Bujna, K.: Adaptive seeding for gaussian mixture models. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 296–308. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_24
Chaudhuri, K., Rao, S.: Beyond Gaussians: spectral methods for learning mixtures of heavy-tailed distributions. In: COLT, vol. 4, p. 1 (2008)
Dempster, A.P.: Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 39, 1–38 (1977)
Ding, B., Kulkarni, J., Yekhanin, S.: Collecting telemetry data privately. In: Advances in Neural Information Processing Systems, pp. 3571–3580 (2017)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. IEEE (2013)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Minimax optimal procedures for locally private estimation. J. Am. Stat. Assoc. 113(521), 182–201 (2018)
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_29
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)
Greenberg, A.: Apple’s differential privacy is about collecting your data-but not your data. Wired, June 13 (2016)
Jia, J., Gong, N.Z.: Calibrate: frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2008–2016. IEEE (2019)
Joseph, M., Kulkarni, J., Mao, J., Wu, S.Z.: Locally private Gaussian estimation. In: Advances in Neural Information Processing Systems, pp. 2980–2989 (2019)
Joseph, M., Kulkarni, J., Mao, J., Wu, Z.S.: Locally private Gaussian estimation. arXiv preprint arXiv:1811.08382 (2018)
Kamath, G., Li, J., Singhal, V., Ullman, J.: Privately learning high-dimensional distributions. arXiv preprint arXiv:1805.00216 (2018)
Kamath, G., Sheffet, O., Singhal, V., Ullman, J.: Differentially private algorithms for learning mixtures of separated Gaussians. In: Advances in Neural Information Processing Systems, pp. 168–180 (2019)
Karwa, V., Vadhan, S.: Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908 (2017)
Kothari, P.K., Steinhardt, J., Steurer, D.: Robust moment estimation and improved clustering via sum of squares. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1035–1046. ACM (2018)
Kumar, A., Kannan, R.: Clustering with spectral norm and the k-means algorithm. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 299–308. IEEE (2010)
Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 93–102. IEEE (2010)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)
Scott, D.W.: Parametric statistical modeling by minimum integrated square error. Technometrics 43(3), 274–285 (2001)
Wu, Y., Wu, Y., Peng, H., Zeng, J., Chen, H., Li, C.: Differentially private density estimation via Gaussian mixtures model. In: 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), pp. 1–6. IEEE (2016)
Zhao, J., et al.: Reviewing and improving the Gaussian mechanism for differential privacy. arXiv preprint arXiv:1911.12060 (2019)
Acknowledgments
This work was supported by the Anhui Initiative in Quantum Information Technologies (No. AHY150300). Wei Yang and Shaowei Wang are both corresponding authors.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Diao, X., Yang, W., Wang, S., Huang, L., Xu, Y. (2020). PrivGMM: Probability Density Estimation with Local Differential Privacy. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)