Abstract
In this paper, we focus on the problem of evolutionary theme patterns (ETP) analysis in cross-lingual scenarios. Previously, cross-lingual topic models in batch mode have been explored. By directly applying such techniques in ETP analysis, however, two limitations would arise. (1) It is time-consuming to re-train all the latent themes for each time interval in the time sequence. (2) The latent themes between two adjacent time intervals might lose continuity. This motivates us to utilize online algorithms to solve these limitations. The research of online topic models is not novel, but previous work cannot be directly employed, because they mainly target at monolingual texts. Consequently, we propose an online cross-lingual topic model. By experimental verification in a real world dataset, we demonstrate that our algorithm performs well in the ETP analysis task. It can efficiently reduce the updating time complexity; and it is effective in solving the continuity limitation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
AlSumait, L., Barbará, D., Domeniconi, C.: On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of ICDM 2008, pp. 3–12. IEEE (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022 (2003)
Boyd-Graber, J., Blei, D.M.: Multilingual topic models for unaligned text. In: Proceedings of UAI 2009, pp. 75–82. AUAI Press (2009)
Chou, T.C., Chen, M.C.: Using incremental plsi for threshold-resilient online event analysis. IEEE Transactions on Knowledge and Data Engineering 20(3), 289–299 (2008)
He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L.: Detecting topic evolution in scientific literature: how can citations help? In: Proceeding of CIKM 2009, pp. 957–966. ACM (2009)
Hoffman, M.D., Blei, D.M., Bach, F.: Online learning for latent dirichlet allocation. In: Proceedings of NIPS 2010, vol. 23, pp. 856–864 (2010)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR 1999, pp. 50–57. ACM (1999)
Iwata, T., Yamada, T., Sakurai, Y., Ueda, N.: Online multiscale dynamic topic models. In: Proceedings of KDD 2010, pp. 663–672. ACM (2010)
Jagarlamudi, J., Daumé III, H.: Extracting multilingual topics from unaligned comparable corpora. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 444–456. Springer, Heidelberg (2010)
Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the KDD 2003, vol. 7, pp. 373–397 (2003)
Lin, C.X., Mei, Q., Han, J., Jiang, Y., Danilevsky, M.: The joint inference of topic diffusion and evolution in social communities. In: Proceedings of ICDM 2011, pp. 378–387. IEEE (2011)
Mei, Q., Zhai, C.X.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the KDD 2005, pp. 198–207. ACM (2005)
Ni, X., Sun, J.T., Hu, J., Chen, Z.: Cross lingual text classification by mining multilingual topics from wikipedia. In: Proceedings of WSDM 2011, pp. 375–384. ACM (2011)
Wang, C., Zhang, M., Ma, S., Ru, L.: Automatic online news issue construction in web environment. In: Proceedings of WWW 2008, pp. 457–466. ACM (2008)
Zhang, D., Mei, Q., Zhai, C.X.: Cross-lingual latent topic extraction. In: Proceedings of ACL 2010. Association for Computational Linguistics, pp. 1128–1137 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xin, X., Zhuang, K., Fang, Y., Huang, H. (2013). Online Cross-Lingual PLSI for Evolutionary Theme Patterns Analysis. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7818. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37453-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-37453-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37452-4
Online ISBN: 978-3-642-37453-1
eBook Packages: Computer ScienceComputer Science (R0)