Heuristic Pretraining for Topic Models

Masada, Tomonari; Takasu, Atsuhiro

doi:10.1007/978-3-319-19066-2_34

Tomonari Masada⁹ &
Atsuhiro Takasu¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9101))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2751 Accesses

Abstract

This paper provides a heuristic pretraining for topic models. While we consider latent Dirichlet allocation (LDA) here, our pretraining can be applied to other topic models. Basically, we use collapsed Gibbs sampling (CGS) to update the latent variables. However, after every iteration of CGS, we regard the latent variables as observable and construct another LDA over them, which we call LDA over LDA (LoL). We then perform the following two types of updates: the update of the latent variables in LoL by CGS and the update of the latent variables in LDA based on the result of the preceding update of the latent variables in LoL. We perform one iteration of CGS for LDA and the above two types of updates alternately only for a small, earlier part of the inference. That is, the proposed method is used as a pretraining. The pretraining stage is followed by the usual iterations of CGS for LDA. The evaluation experiment shows that our pretraining can improve test set perplexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

Robust Initialization for Learning Latent Dirichlet Allocation

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

References

Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: UAI 2009, pp. 27–34 (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Chen, B., Polatkan, G., Sapiro, G., Blei, D., Dunson, D., Carin, L.: Deep learning with hierarchical convolutional factor analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1887–1901 (2013)
Article Google Scholar
Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML 2010, pp. 375–382 (2010)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Minka, T.P., Lafferty, J.: Expectation-propagation for the generative aspect model. In: UAI 2002, pp. 352–359 (2002)
Google Scholar
Minka, T.P.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Teh, Y.W., Newman, D., Welling., M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 19 (2007)
Google Scholar
Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.P., Li., X.: Topical keyphrase extraction from Twitter. In: HLT 2011, pp. 379–388 (2011)
Google Scholar
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, 8528521, Japan
Tomonari Masada
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 1018430, Japan
Atsuhiro Takasu

Authors

Tomonari Masada
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiro Takasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomonari Masada .

Editor information

Editors and Affiliations

Texas State University, San Marcos, Texas, USA
Moonis Ali
Dongguk University, Seoul, Korea, Republic of (South Korea)
Young Sig Kwon
Dongguk University, Seoul, Korea, Republic of (South Korea)
Chang-Hwan Lee
Dongguk University, Seoul, Korea, Republic of (South Korea)
Juntae Kim
Seoul National University, Seoul, Korea, Republic of (South Korea)
Yongdai Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masada, T., Takasu, A. (2015). Heuristic Pretraining for Topic Models. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-19066-2_34
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19065-5
Online ISBN: 978-3-319-19066-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics