Stable Topic Modeling with Local Density Regularization

Koltcov, Sergei; Nikolenko, Sergey I.; Koltsova, Olessia; Filippov, Vladimir; Bodrunova, Svetlana

doi:10.1007/978-3-319-45982-0_16

Sergei Koltcov²¹,
Sergey I. Nikolenko^21,22,
Olessia Koltsova²¹,
Vladimir Filippov²¹ &
…
Svetlana Bodrunova^21,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9934))

Included in the following conference series:

International Conference on Internet Science

1863 Accesses
8 Citations

Abstract

Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based user-generated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)
Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Advances in Neural Information Processing Systems 18 (2006)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
MATH Google Scholar
Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS, vol. 8265, pp. 265–274. Springer, Heidelberg (2013)
Chapter Google Scholar
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2013)
Google Scholar
Boyd-Graber, J.L., Blei, D.M.: Syntactic topic models. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 185–192. Curran Associates Inc. (2008)
Google Scholar
Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems 20 (2009)
Google Scholar
Chen, R.-C., Swanson, R., Gordon, A.S.: An adaptation of topic modeling to sentences (2010). http://rueycheng.com/paper/adaptation.pdf
Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: Proceedings of the \(18^{\text{th}}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96–104. ACM, New York (2012)
Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl Acad. Sci. 101(Suppl. 1), 5228–5335 (2004)
Article Google Scholar
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)
Article Google Scholar
Hoffmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)
Article MathSciNet MATH Google Scholar
Koltcov, S., Koltsova, O., Nikolenko, S.I.: Latent Dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of the 2014 ACM Conference on Web Science (WebSci 2014), pp. 161–165 (2014)
Google Scholar
Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems 20 (2008)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)
Google Scholar
Li, S.Z.: Markov Random Field Modeling in Image Analysis. Advances in Pattern Recognition. Springer, Heidelberg (2009)
MATH Google Scholar
McFarland, D.A., Ramage, D., Chuang, J., Heer, J., Manning, C.D., Jurafsky, D.: Differentiating language usage through topic models. Poetics 41(6), 607–625 (2013)
Article Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with regularized topic models. In: Advances in Neural Information Processing Systems 24, pp. 496–504. Curran Associates Inc. (2011)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 100–108. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Nikolenko, S.I., Koltsova, O., Koltsov, S.: Topic modelling for qualitative studies. J. Inf. Sci. (2015). doi:10.1177/0165551515617393
Google Scholar
Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, Whistler, Canada, December 2009
Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed problems. W.H. Winston, New York (1977)
MATH Google Scholar
Vorontsov, K.: Additive regularization for topic models of text collections. Doklady Math. 89(3), 301–304 (2014)
Article MathSciNet MATH Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special Issue on Data Analysis and Intelligent Optimization with Applications
Article MathSciNet MATH Google Scholar
Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the \(24^{\text{ th }}\) Conference on Uncertainty in Artificial Intelligence (2008)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the \(12^{\text{ th }}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433 (2006)
Google Scholar
Yohan, J., O. A. H.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 815–824 (2011)
Google Scholar

Download references

Acknowledgments

This work was supported by the Basic Research Program of the National Research University Higher School of Economics.

Author information

Authors and Affiliations

National Research University Higher School of Economics, St. Petersburg, Russia
Sergei Koltcov, Sergey I. Nikolenko, Olessia Koltsova, Vladimir Filippov & Svetlana Bodrunova
Steklov Institute of Mathematics, St. Petersburg, Russia
Sergey I. Nikolenko
St. Petersburg State University, St. Petersburg, Russia
Svetlana Bodrunova

Authors

Sergei Koltcov
View author publications
You can also search for this author in PubMed Google Scholar
Sergey I. Nikolenko
View author publications
You can also search for this author in PubMed Google Scholar
Olessia Koltsova
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Filippov
View author publications
You can also search for this author in PubMed Google Scholar
Svetlana Bodrunova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey I. Nikolenko .

Editor information

Editors and Affiliations

University of Florence, Florence, Italy
Franco Bagnoli
Center for Research and Technology, Thessaloniki, Greece
Anna Satsiou
National and Kapodistrian University, Athens, Greece
Ioannis Stavrakakis
University of Florence, Florence, Italy
Paolo Nesi
University of Florence, Florence, Italy
Giovanna Pacini
University of Zurich, Zürich, Switzerland
Yanina Welp
University of Southampton, Southampton, United Kingdom
Thanassis Tiropanis
University of Southampton, Southampton, United Kingdom
Dominic DiFranzo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koltcov, S., Nikolenko, S.I., Koltsova, O., Filippov, V., Bodrunova, S. (2016). Stable Topic Modeling with Local Density Regularization. In: Bagnoli, F., et al. Internet Science. INSCI 2016. Lecture Notes in Computer Science(), vol 9934. Springer, Cham. https://doi.org/10.1007/978-3-319-45982-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-45982-0_16
Published: 25 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45981-3
Online ISBN: 978-3-319-45982-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics