Online belief propagation algorithm for probabilistic latent semantic analysis

Ye, Yun; Gong, Shengrong; Liu, Chunping; Zeng, Jia; Jia, Ning; Zhang, Yi

doi:10.1007/s11704-013-2360-7

Online belief propagation algorithm for probabilistic latent semantic analysis

Research Article
Published: 06 June 2013

Volume 7, pages 526–535, (2013)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yun Ye^1,2,
Shengrong Gong¹,
Chunping Liu¹,
Jia Zeng¹,
Ning Jia² &
…
Yi Zhang²

174 Accesses
6 Citations
Explore all metrics

Abstract

Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memory-efficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online expectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration and classification approach based on probabilistic semantic association for big data

Article Open access 11 October 2021

Self-organizing weighted incremental probabilistic latent semantic analysis

Article 26 April 2017

Distributed Population-Based Simultaneous Perturbation Stochastic Approximation for Fine-Tuning Large Language Models

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613–620
Article MATH Google Scholar
Thomas K, Landauer P W F, Laham A F. An introduction to latent semantic analysis. Communications of the ACM, 1998, 25: 259–284
Google Scholar
Hoffman T. Probabilistic latent semantic analysis: uncertainty in artificial intelligence. 1999
Google Scholar
Blei DM, Ng A Y, Jordan MI. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022
MATH Google Scholar
Canini K R, Shi L, Griffiths T L. Online inference of topics with latent dirichlet allocation. In: Proceedings of the 2009 International Conference on Artificial Intelligence and Statistics. 2009, 65–72
Google Scholar
Zeng J, Cheung W K, Liu J. Learning topic models by belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 1
Google Scholar
Zhuang L, She L, Jiang Y, Tang K, Yu N. Image classification via semi-supervised PLSA. In: Proceedings of the 5th International Conference on Image and Graphics. 2009, 205–208
Google Scholar
Xu J, Ye G, Wang Y, Wang W, Yang J. Online learning for plsa-based visual recognition. Computer Vision-ACCV 2010, 2011, 95–108
Chapter Google Scholar
AlSumait L, Barbará D, Domeniconi C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 3–12
Google Scholar
Yao L, Mimno D, McCallum A. Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 937–946
Chapter Google Scholar
Hoffman M D, Blei D M, Bach F. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 2010, 23: 856–864
Google Scholar
Wang C, Paisley J, Blei D M. Online variational inference for the hierarchical dirichlet process. In: Proceedings of the 14th Intenational Conference on Artificial Intelligence and Statistics. 2011, 752–760
Google Scholar
Banerjee A, Basu S. Topic models over text streams: a study of batch and online unsupervised learning. In: Proceedings of the 2007 SIAM International Conference on Data Mining. 2007, 431–436
Google Scholar
Nair V, Clark J J. An unsupervised, online learning framework for moving object detection. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2004, II-317–II-324
Google Scholar
Pham M T, Cham T J. Online learning asymmetric boosted classifiers for object detection. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8
Chapter Google Scholar
Shalev-Shwartz S, Singer Y, Ng A Y. Online and batch learning of pseudo-metrics. In: Proceedings of the 21st International Conference on Machine Learning. 2004
Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G. Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 2010, 11: 19–60
MathSciNet MATH Google Scholar
Vijayakumar S, D’souza A, Schaal S. Incremental online learning in high dimensions. Neural Computation, 2005, 17(12): 2602–2634
Article MathSciNet Google Scholar
Kivinen J, Smola A J, Williamson R C. Online learning with kernels. IEEE Transactions on Signal Processing, 2004, 52(8): 2165–2176
Article MathSciNet Google Scholar
Xu J, Ye G, Wang Y, Herman G, Zhang B, Yang J. Incremental EM for probabilistic latent semantic analysis on human action recognition. In: Proceedings of the 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. 2009, 55–60
Google Scholar
Singh M, Khan F U. Effect of incremental EM on document summarization using probabilistic latent semantic analysis. Lecture Notes in Engineering and Computer Science, 2012, 2198
Google Scholar
Bottou L. Online learning and stochastic approximations. On-line Learning in Neural Networks, 1998, 17: 9–42
Google Scholar
Zhu S, Zeng J, Mamitsuka H. Enhancing medline document clustering by incorporating mesh semantic similarity. Bioinformatics, 2009, 25(15): 1944–1951
Article Google Scholar
Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 2007, 8: 2047–2076
MathSciNet Google Scholar
Eisenstein J, Xing E. The CMU 2008 political blog corpus. Machine Learning Department, School of Computer Science, Carnegie Mellon University, 2010
Google Scholar
Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M. Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 569–577
Chapter Google Scholar
Zeng J. A topic modeling toolbox using belief propagation. Journal of Machine Learning Research, 2012, 13: 2233–2236
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Soochow University, Suzhou, 215006, China
Yun Ye, Shengrong Gong, Chunping Liu & Jia Zeng
Feng Chao Revenue, Baidu Online Network Technology Co., LTD, Beijing, 100000, China
Yun Ye, Ning Jia & Yi Zhang

Authors

Yun Ye
View author publications
Search author on:PubMed Google Scholar
Shengrong Gong
View author publications
Search author on:PubMed Google Scholar
Chunping Liu
View author publications
Search author on:PubMed Google Scholar
Jia Zeng
View author publications
Search author on:PubMed Google Scholar
Ning Jia
View author publications
Search author on:PubMed Google Scholar
Yi Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Shengrong Gong.

Additional information

Yun Ye received her BS in mathematical and applied mathematics from Nanjing University of Finance and Economics in 2010. She is currently an MS candidate at Soochow University, where she is studying topic modeling for dynamic network data.

Shengrong Gong received his MS from Harbin Institute of Technology in 1993, and his PhD from Beihang University in 2001. He is a professor and doctoral supervisor of the School of Computer Science and Technology, Soochow University. His research interests are image and video processing, pattern recognition, and computer vision.

Chunping Liu is an associate professor of the School of Computer Science and Technology, Soochow University. In 2002, she received her PhD in Pattern Recognition and Intelligent Systems Engineering from the Department of Computer Science, Nanjing University of Science and Technology. Her research interests are image and video processing, pattern recognition, and and computer vision.

Jia Zeng received his BE from Wuhan University of Technology, Wuhan, China, in 2002, and his PhD from the City University of Hong Kong, in 2007. He is a professor in the School of Computer Science and Technology, Soochow University. His research interests are machine learning and computational biology. He is a member of the CCF, the IEEE, and the ACM.

Ning Jia is a senior development engineer of Baidu, Inc. He received his PhD from the Institute of Acoustics, Chinese Academy of Sciences (IACAS) in 2008. His research interest is topic models.

Yi Zhang is a senior development engineer of Baidu, Inc. Currently he leads the key word recommender group of the Electronic Commerce department, and mainly focuses on sponsored search, machine learning, and data mining. He received his MS in Computer Science from Zhejiang University in 2008.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Y., Gong, S., Liu, C. et al. Online belief propagation algorithm for probabilistic latent semantic analysis. Front. Comput. Sci. 7, 526–535 (2013). https://doi.org/10.1007/s11704-013-2360-7

Download citation

Received: 20 November 2012
Accepted: 21 February 2013
Published: 06 June 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s11704-013-2360-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online belief propagation algorithm for probabilistic latent semantic analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Integration and classification approach based on probabilistic semantic association for big data

Self-organizing weighted incremental probabilistic latent semantic analysis

Distributed Population-Based Simultaneous Perturbation Stochastic Approximation for Fine-Tuning Large Language Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now