Skip to main content

Stable Topic Modeling with Local Density Regularization

  • Conference paper
  • First Online:
Internet Science (INSCI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9934))

Included in the following conference series:

Abstract

Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based user-generated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  2. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)

    Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Advances in Neural Information Processing Systems 18 (2006)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)

    MATH  Google Scholar 

  5. Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS, vol. 8265, pp. 265–274. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2013)

    Google Scholar 

  7. Boyd-Graber, J.L., Blei, D.M.: Syntactic topic models. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 185–192. Curran Associates Inc. (2008)

    Google Scholar 

  8. Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems 20 (2009)

    Google Scholar 

  9. Chen, R.-C., Swanson, R., Gordon, A.S.: An adaptation of topic modeling to sentences (2010). http://rueycheng.com/paper/adaptation.pdf

  10. Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: Proceedings of the \(18^{\text{th}}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96–104. ACM, New York (2012)

    Google Scholar 

  11. Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Natl Acad. Sci. 101(Suppl. 1), 5228–5335 (2004)

    Article  Google Scholar 

  12. Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)

    Article  Google Scholar 

  13. Hoffmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Koltcov, S., Koltsova, O., Nikolenko, S.I.: Latent Dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of the 2014 ACM Conference on Web Science (WebSci 2014), pp. 161–165 (2014)

    Google Scholar 

  15. Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems 20 (2008)

    Google Scholar 

  16. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: EACL, pp. 530–539 (2014)

    Google Scholar 

  17. Li, S.Z.: Markov Random Field Modeling in Image Analysis. Advances in Pattern Recognition. Springer, Heidelberg (2009)

    MATH  Google Scholar 

  18. McFarland, D.A., Ramage, D., Chuang, J., Heer, J., Manning, C.D., Jurafsky, D.: Differentiating language usage through topic models. Poetics 41(6), 607–625 (2013)

    Article  Google Scholar 

  19. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  20. Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with regularized topic models. In: Advances in Neural Information Processing Systems 24, pp. 496–504. Curran Associates Inc. (2011)

    Google Scholar 

  21. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 100–108. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  22. Nikolenko, S.I., Koltsova, O., Koltsov, S.: Topic modelling for qualitative studies. J. Inf. Sci. (2015). doi:10.1177/0165551515617393

    Google Scholar 

  23. Ramage, D., Rosen, E., Chuang, J., Manning, C.D., McFarland, D.A.: Topic modeling for the social sciences. In: NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, Whistler, Canada, December 2009

    Google Scholar 

  24. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed problems. W.H. Winston, New York (1977)

    MATH  Google Scholar 

  25. Vorontsov, K.: Additive regularization for topic models of text collections. Doklady Math. 89(3), 301–304 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special Issue on Data Analysis and Intelligent Optimization with Applications

    Article  MathSciNet  MATH  Google Scholar 

  27. Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the \(24^{\text{ th }}\) Conference on Uncertainty in Artificial Intelligence (2008)

    Google Scholar 

  28. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the \(12^{\text{ th }}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433 (2006)

    Google Scholar 

  29. Yohan, J., O. A. H.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, New York, NY, USA, pp. 815–824 (2011)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Basic Research Program of the National Research University Higher School of Economics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey I. Nikolenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Koltcov, S., Nikolenko, S.I., Koltsova, O., Filippov, V., Bodrunova, S. (2016). Stable Topic Modeling with Local Density Regularization. In: Bagnoli, F., et al. Internet Science. INSCI 2016. Lecture Notes in Computer Science(), vol 9934. Springer, Cham. https://doi.org/10.1007/978-3-319-45982-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45982-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45981-3

  • Online ISBN: 978-3-319-45982-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics