Abstract
Topic modeling is a machine learning technique for discovering semantic topics from a document collection. It typically assumes that a document is a multinomial distribution over latent topics, and a topic is a multinomial distribution over words. By capturing the co-occurrence statistics of words in the documents, it uncovers these distributions which indicate important semantic relationships. Topic modeling has been widely studied in machine learning, text mining, and natural language processing (NLP). This chapter gives an introduction to topic modeling. It covers both the fundamental techniques and some of its important applications in NLP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Andrzejewski D, Zhu X, Craven M (2009) Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In: ICML, Montreal, pp 25–32
Blei DM, McAuliffe JD (2010) Supervised topic models. In: NIPS, Whistler, pp 121–128
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Boyd-Graber JL, Blei DM, Zhu X (2007) A topic model for word sense disambiguation. In: EMNLP-CoNLL, Prague, pp 1024–1033
Chang J, Boyd-Graber J, Chong W, Gerrish S, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: NIPS, Whistler, pp 288–296
Chen Z, Liu B (2014) Topic modeling using topics from many domains, lifelong learning and big data. In: ICML, Beijing, pp 703–711
Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting domain knowledge in aspect extraction. In: EMNLP, Seattle, pp 1655–1667
Chen Z, Mukherjee A, Liu B (2014) Aspect extraction with automated prior knowledge learning. In: ACL, Baltimore, pp 347–358
Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: ACL, Jeju Island, pp 115–119
Griffiths TL, Steyvers M (2004) Finding scientific topics. PNAS 101(Suppl):5228–5235
Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB (2004) Integrating topics and syntax. In: NIPS, Vancouver, pp 537–544
Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: ACL, Boulder, pp 362–370
Han X, Sun L (2012) An entity-topic model for entity linking. In: EMNLP, Jeju Island, pp 105–115
Hofmann T (1999) Probabilistic latent semantic analysis. In: UAI, Stockholm, pp 289–296
Hu Y, Boyd-Graber J, Satinoff B (2011) Interactive topic modeling. In: ACL, Portland, pp 248–257
Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. In: WSDM, Hong Kong, pp 815–824
Krestel R, Fankhauser P, Nejdl W (2009) Latent dirichlet allocation for tag recommendation. In: RecSys, New York, pp 61–68
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: CIKM, Hong Kong, pp 375–384
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Lu Y, Zhai C (2008) Opinion integration through semi-supervised topic modeling. In: WWW, Beijing, pp 121–130
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: WWW, Banff, pp 171–180
Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: UAI’02, Edmonton, pp 352–359
Mukherjee A, Liu B (2012) Aspect extraction through semi-supervised modeling. In: ACL, Jeju Island, pp 339–348
Petterson J, Smola A, Caetano T, Buntine W, Narayanamurthy S (2010) Word features for latent Dirichlet allocation. In: NIPS, Whistler, pp 1921–1929
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476): 1–30
Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: WWW, Beijing, pp 111–120
Toutanova K, Johnson M (2008) A Bayesian LDA-based Model for Semi-Supervised Part-of-speech Tagging. In: NIPS, Whistler
Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: SIGIR, Seattle, pp 178–185
Yao L, Haghighi A, Riedel S, McCallum A (2011) Structured relation discovery using generative models. In: EMNLP, Edinburgh, pp 1456–1466
Zhao WX, Jiang J, He J, Song Y, Achananuparp P, Lim E-P, Li X (2011) Topical keyphrase extraction from twitter. In: ACL, Portland, pp 379–388
Zhao WX, Jiang J, Yan H, Li X (2010) Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: EMNLP, Cambridge, pp 56–65
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Chen, Z., Liu, B. (2017). Topic Models for NLP Applications. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_906
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_906
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering