FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling

Huang, Tao; Chen, Hong; Zhao, Suyun

doi:10.1007/978-3-031-25201-3_22

Tao Huang^13,14,
Hong Chen^13,14 &
Suyun Zhao^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13423))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

807 Accesses

Abstract

Latent Dirichlet allocation (LDA) is a widely used fundamental tool for text analysis. Collapsed Gibbs sampling (CGS), as a widely adopted algorithm for learning the parameters of LDA, has the risk of privacy leakage. In this paper, we study the inherent privacy of CGS which is exploited to preserve the privacy for latent topic updates. We propose a method, called group subsampling, and a novel centralized privacy-preserving algorithm, called Fast-Differentially-Private LDA (FDP-LDA) to amplify the inherent privacy and improve the efficiency of traditional differentially private CGS. Theoretically, the general upper bound of the amplified inherent privacy loss in each iteration of FDP-LDA is verified mathematically. To our best knowledge, this is the first work that analyzes the inherent privacy amplification of differentially private CGS. Experimentally, results on real-world datasets validate the improved performances of FDP-LDA.

Supported by organization x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy

Article 30 November 2022

An Online Gibbs Sampler Algorithm for Hierarchical Dirichlet Processes Prior

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

Notes

1.
https://archive.ics.uci.edu/ml/datasets/bag+of+words.
2.
https://archive.ics.uci.edu/ml/datasets/bag+of+words.

References

Bernstein, G., Sheldon, D.R.: Differentially private Bayesian inference for exponential families. In: Advances in Neural Information Processing Systems, pp. 2919–2929 (2018)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Bun, M., Steinke, T.: Concentrated differential privacy: simplifications, extensions, and lower bounds. In: Hirt, M., Smith, A. (eds.) TCC 2016. LNCS, vol. 9985, pp. 635–658. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53641-4_24
Chapter Google Scholar
Carlo, C.M.: Markov chain Monte Carlo and Gibbs sampling. Lecture Notes for EEB 581 (2004)
Google Scholar
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
MathSciNet MATH Google Scholar
Foulds, J., Geumlek, J., Welling, M., Chaudhuri, K.: On the theory and practice of privacy-preserving Bayesian data analysis. arXiv preprint arXiv:1603.07294 (2016)
Ge, Y.F., Cao, J., Wang, H., Chen, Z., Zhang, Y.: Set-based adaptive distributed differential evolution for anonymity-driven database fragmentation. Data Sci. Eng. 6(4), 380–391 (2021). https://doi.org/10.1007/s41019-021-00170-4
Article Google Scholar
He, J., Liu, H., Zheng, Y., Tang, S., He, W., Du, X.: Bi-labeled LDA: inferring interest tags for non-famous users in social network. Data Sci. Eng. 5(1), 27–47 (2020). https://doi.org/10.1007/s41019-019-00113-0
Article Google Scholar
Hu, C., Cao, H., Gong, Q.: Sub-Gibbs sampling: a new strategy for inferring LDA. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 907–912. IEEE (2017)
Google Scholar
Li, A.Q., Ahmed, A., Ravi, S., Smola, A.J.: Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 891–900 (2014)
Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008). https://doi.org/10.1007/978-0-387-76371-2
Book MATH Google Scholar
MacKay, D.J., Mac Kay, D.J.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Google Scholar
Mironov, I.: Rényi differential privacy. In: 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275. IEEE (2017)
Google Scholar
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed Gibbs sampling for latent dirichlet allocation. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 569–577 (2008)
Google Scholar
Wang, Y., Tong, Y., Shi, D.: Federated latent dirichlet allocation: a local differential privacy based framework. In: AAAI, pp. 6283–6290 (2020)
Google Scholar
Wang, Y., et al.: Towards topic modeling for big data. arXiv preprint arXiv:1405.4402 (2014)
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 937–946 (2009)
Google Scholar
Yuan, J., et al.: LightLDA: big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1351–1361 (2015)
Google Scholar
Yut, L., Zhang, C., Shao, Y., Cui, B.: LDA* a robust and large-scale topic modeling system. Proc. VLDB Endow. 10(11), 1406–1417 (2017)
Article Google Scholar
Zhao, F., Ren, X., Yang, S., Han, Q., Zhao, P., Yang, X.: Latent dirichlet allocation model training with differential privacy. IEEE Trans. Inf. Forensics Secur. 16, 1290–1305 (2020)
Article Google Scholar
Zhao, F., Ren, X., Yang, S., Yang, X.: On privacy protection of latent dirichlet allocation model training. arXiv preprint arXiv:1906.01178 (2019)

Download references

Acknowledgements

Hong Chen is the corresponding author. This work was supported by National Natural Science Foundation of China (62072460, 62076245, 62172424), Beijing Natural Science Foundation (4212022).

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China, Beijing, China
Tao Huang, Hong Chen & Suyun Zhao
School of Information, Renmin University of China, Beijing, China
Tao Huang, Hong Chen & Suyun Zhao

Authors

Tao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Suyun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Chen .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Bohan Li
Newcastle University, Callaghan, NSW, Australia
Lin Yue
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Chuanqi Tao
Jinan University, Guangzhou, China
Xuming Han
Free University of Bozen-Bolzano, Bolzano, Italy
Diego Calvanese
University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, T., Chen, H., Zhao, S. (2023). FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13423. Springer, Cham. https://doi.org/10.1007/978-3-031-25201-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-25201-3_22
Published: 10 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25200-6
Online ISBN: 978-3-031-25201-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy

An Online Gibbs Sampler Algorithm for Hierarchical Dirichlet Processes Prior

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy

An Online Gibbs Sampler Algorithm for Hierarchical Dirichlet Processes Prior

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation