Self-organizing weighted incremental probabilistic latent semantic analysis

Li, Ning; Luo, Wenjuan; Yang, Kun; Zhuang, Fuzhen; He, Qing; Shi, Zhongzhi

doi:10.1007/s13042-017-0681-9

Self-organizing weighted incremental probabilistic latent semantic analysis

Original Article
Published: 26 April 2017

Volume 9, pages 1987–1998, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Ning Li^1,2,
Wenjuan Luo²,
Kun Yang³,
Fuzhen Zhuang²,
Qing He² &
…
Zhongzhi Shi²

346 Accesses
12 Citations
Explore all metrics

Abstract

PLSA (Probabilistic Latent Semantic Analysis) is a popular topic modeling technique which has been widely applied to text mining applications to discover the underlying topics embedded in the data corpus. However, due to the variability of increasing data, it is necessary to discover the dynamic topics and process the large dataset incrementally. Moreover, PLSA models suffer from the problem of inferencing new documents. To overcome these problems, in this paper, we propose a novel Weighted Incremental PLSA algorithm called WIPLSA to dynamically discover topics and incrementally learn the topics from new documents. The experiments verify that the proposed WIPLSA could capture the dynamic topics hidden in the dynamic updating data corpus. Compared with PLSA, MAP PLSA and QB PLSA, WIPLSA performs better in perspexity on large dataset, which make it applicable for big data mining. In addition, WIPLSA has good performance in the application of document categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Short Text Dynamic Clustering Approach for Semantic-Enhanced Knowledge

Topic Representation using Semantic-Based Patterns

LSA-PTM: A Propagation-Based Topic Model Using Latent Semantic Analysis on Heterogeneous Information Networks

Notes

http://news.163.com/special/.

References

Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84
Article Google Scholar
Yan Y, Chen L, Tjhi W-C (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst. 215:74–89
Article MathSciNet Google Scholar
Shehata S, Karray F, Kamel MS (2013) An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 1–24
Freire A, Cacheda F, Formoso V, Carneiro V (2013) Analysis of performance evaluation techniques for large-scale information retrieval. Analyzing the Performance of Top-K Retrieval Algorithms, INVITED SPEAKER, p 2001
Choo J, Lee C, Clarkson E, Liu Z, Lee H, Chau DHP, Li F, Kannan R, Stolper CD, Inouye D et al (2013) Visirr: Interactive visual information retrieval and recommendation for large-scale document data
Mei Q, Zhai C (2001) A note on em algorithm for probabilistic latent semantic analysis. In: Proceedings of the International Conference on Information and Knowledge Management, CIKM
Bai L, Liang J, Dang C, Cao F (2013) A novel fuzzy clustering algorithm with between-cluster information for categorical data. Fuzzy Sets Syst 215:55–73
Article MathSciNet Google Scholar
Liu CL, Chang TH, Li HH (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-kmeans. Fuzzy Sets Syst
Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F (2013) Evex in st13: application of a large-scale text mining resource to event extraction and network construction. ACL 2013:26
Google Scholar
Zhou E, Zhong N, Li Y (2013) Extracting news blog hot topics based on the w2t methodology. World Wide Web, pp 1–28
Wang X, Wang J (2013) A method of hot topic detection in blogs using n-gram model. J Softw 8:184–191
Article Google Scholar
Steyvers M, Griffiths T (2007) Probabilistic topic models. Handb Latent Semantic Anal 427:424–440
Google Scholar
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 424–433
Wang C, Blei D, Heckerman D (2012) Continuous time dynamic topic models. arXiv:1206.3298
Aggarwal CC, Zhai C (2012) Mining text data. Springer
Gruber A, Rosen-Zvi M, Weiss Y (2012) Latent topic models for hypertext. arXiv:1206.3254
Bolshakova E, Loukachevitch N, Nokel M (2013) Topic models can improve domain term extraction. In: Advances in Information Retrieval. Springer, pp 684–687
Lin C, He Y, Everson R, Ruger S (2012) Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng 24:1134–1145
Article Google Scholar
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JASIS 41:391–407
Article Google Scholar
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Chaney AJB, Blei DM (2012) Visualizing topic models. In: ICWSM
Zhai K, Boyd-Graber J, Asadi N, Alkhouja (2012) Mr. lda: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 879–888
Li N, Zhuang F, He Q, Shi Z (2012) Pplsa: Parallel probabilistic latent semantic analysis based on mapreduce. In: Intelligent Information Processing VI. Springer, pp 40–49
Chien J-T, Wu M-S (2008) Adaptive bayesian latent semantic analysis. IEEE Trans Audio Speech Lang Process 16:198–207
Article Google Scholar
Wu H, Wang Y, Cheng X (2008) Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM, pp 99–106
Tzu-Chuan Chou MCC (2008) Using incremental plsi for threshold-resilient online event analysis. IEEE Trans Knowl Data Eng 20:289–299
Article Google Scholar
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196
Article Google Scholar
Surendran AC, Sra S (2006) Incremental aspect models for mining document streams. In: Knowledge Discovery in Databases: PKDD 2006. Springer, pp 633–640
Wu H, Wang Y (2009) Incremental learning of triadic plsa for collaborative filtering. In: Active Media Technology. Springer, pp 81–92
Chapter Google Scholar
Qian Y (2016) Context based approach to overlapping ambiguity resolution in chinese word segmentation. J Chongqing Technol Bus Univ (Nat Sci Edn) 20–24

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (No. 91546122, 61602438, 61573335, 61473273, 61473274, 61363058), National High-tech R&D Program of China (863 Program) (No. 2014AA015105), National Science and Technology Support Program (No. 2014BAK02B07), National major R&D program of Beijing Municipal Science & Technology Commission (Z161100002616032), Guangdong provincial science and technology plan projects (No. 2015 B 010109005).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Ning Li
The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Ning Li, Wenjuan Luo, Fuzhen Zhuang, Qing He & Zhongzhi Shi
National Institute of Metrology, Beijing, 100029, China
Kun Yang

Authors

Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenjuan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fuzhen Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Qing He
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, N., Luo, W., Yang, K. et al. Self-organizing weighted incremental probabilistic latent semantic analysis. Int. J. Mach. Learn. & Cyber. 9, 1987–1998 (2018). https://doi.org/10.1007/s13042-017-0681-9

Download citation

Received: 05 February 2016
Accepted: 10 April 2017
Published: 26 April 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s13042-017-0681-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-organizing weighted incremental probabilistic latent semantic analysis

Abstract

Access this article

Similar content being viewed by others

Short Text Dynamic Clustering Approach for Semantic-Enhanced Knowledge

Topic Representation using Semantic-Based Patterns

LSA-PTM: A Propagation-Based Topic Model Using Latent Semantic Analysis on Heterogeneous Information Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-organizing weighted incremental probabilistic latent semantic analysis

Abstract

Access this article

Similar content being viewed by others

Short Text Dynamic Clustering Approach for Semantic-Enhanced Knowledge

Topic Representation using Semantic-Based Patterns

LSA-PTM: A Propagation-Based Topic Model Using Latent Semantic Analysis on Heterogeneous Information Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation