Abstract
Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based user-generated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Advances in Neural Information Processing Systems 18 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS, vol. 8265, pp. 265–274. Springer, Heidelberg (2013)
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2013)
Boyd-Graber, J.L., Blei, D.M.: Syntactic topic models. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 185–192. Curran Associates Inc. (2008)
Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems 20 (2009)
Chen, R.-C., Swanson, R., Gordon, A.S.: An adaptation of topic modeling to sentences (2010). http://rueycheng.com/paper/adaptation.pdf
Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: Proceedings of the \(18^{\text{th}}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96–104. ACM, New York (2012)
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl Acad. Sci. 101(Suppl. 1), 5228–5335 (2004)
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)
Hoffmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)
Koltcov, S., Koltsova, O., Nikolenko, S.I.: Latent Dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of the 2014 ACM Conference on Web Science (WebSci 2014), pp. 161–165 (2014)
Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems 20 (2008)
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Li, S.Z.: Markov Random Field Modeling in Image Analysis. Advances in Pattern Recognition. Springer, Heidelberg (2009)
McFarland, D.A., Ramage, D., Chuang, J., Heer, J., Manning, C.D., Jurafsky, D.: Differentiating language usage through topic models. Poetics 41(6), 607–625 (2013)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics, Stroudsburg (2011)
Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with regularized topic models. In: Advances in Neural Information Processing Systems 24, pp. 496–504. Curran Associates Inc. (2011)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 100–108. Association for Computational Linguistics, Stroudsburg (2010)
Nikolenko, S.I., Koltsova, O., Koltsov, S.: Topic modelling for qualitative studies. J. Inf. Sci. (2015). doi:10.1177/0165551515617393
Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, Whistler, Canada, December 2009
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed problems. W.H. Winston, New York (1977)
Vorontsov, K.: Additive regularization for topic models of text collections. Doklady Math. 89(3), 301–304 (2014)
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special Issue on Data Analysis and Intelligent Optimization with Applications
Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the \(24^{\text{ th }}\) Conference on Uncertainty in Artificial Intelligence (2008)
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the \(12^{\text{ th }}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433 (2006)
Yohan, J., O. A. H.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 815–824 (2011)
Acknowledgments
This work was supported by the Basic Research Program of the National Research University Higher School of Economics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Koltcov, S., Nikolenko, S.I., Koltsova, O., Filippov, V., Bodrunova, S. (2016). Stable Topic Modeling with Local Density Regularization. In: Bagnoli, F., et al. Internet Science. INSCI 2016. Lecture Notes in Computer Science(), vol 9934. Springer, Cham. https://doi.org/10.1007/978-3-319-45982-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-45982-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45981-3
Online ISBN: 978-3-319-45982-0
eBook Packages: Computer ScienceComputer Science (R0)