Incorporating Probabilistic Knowledge into Topic Models

Yao, Liang; Zhang, Yin; Wei, Baogang; Qian, Hongze; Wang, Yibing

doi:10.1007/978-3-319-18032-8_46

Incorporating Probabilistic Knowledge into Topic Models

Liang Yao¹⁰,
Yin Zhang¹⁰,
Baogang Wei¹⁰,
Hongze Qian¹⁰ &
…
Yibing Wang¹⁰

Conference paper
First Online: 01 January 2015

4383 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Abstract

Probabilistic Topic Models could be used to extract low-dimension aspects from document collections. However, such models without any human knowledge often produce aspects that are not interpretable. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. In this paper, we incorporate human knowledge in the form of probabilistic knowledge base into topic models. By combining latent Dirichlet allocation, a widely used topic model with Probase, a large-scale probabilistic knowledge base, we improve the semantic coherence significantly. Our evaluation results will demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrzejewski, D., Zhu, X.: Latent dirichlet allocation with topic-in-set knowledge. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, pp. 43–48. Association for Computational Linguistics (2009)
Google Scholar
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via dirichlet forest priors. In: ICML, pp. 25–32. ACM (2009)
Google Scholar
Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In: IJCAI (2011)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Bloom, P.: Glue for the mental world (2003)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD. ACM (2008)
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS, pp. 288–296 (2009)
Google Scholar
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
Chapter Google Scholar
Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: ICML, pp. 703–711 (2014)
Google Scholar
Chen, Z., Mukherjee, A., Liu, B.: Aspect extraction with automated prior knowledge learning. In: ACL, pp. 347–358 (2014)
Google Scholar
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: CIKM, pp. 209–218. ACM (2013)
Google Scholar
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting domain knowledge in aspect extraction. In: EMNLP, pp. 1655–1667 (2013)
Google Scholar
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: IJCAI. AAAI Press (2013)
Google Scholar
Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: ICML, pp. 612–620 (2013)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall:(preliminary results). In: WWW, pp. 100–110. ACM (2004)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57. ACM (1999)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)
Google Scholar
Murphy, G.L.: The big book of concepts. MIT press (2002)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: NAACL-HLT, pp. 100–108. Association for Computational Linguistics (2010)
Google Scholar
Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: IJCAI, pp. 2330–2336. AAAI Press (2011)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
Google Scholar
Wallach, H.M., Minmo, D., McCallum, A.: Rethinking lda: why priors matter. In: NIPS (2009)
Google Scholar
Wang, Z., Wang, H., Xiao, Y., Wen, J.R.: How to make a semantic network probabilistic. In: TechReport. MSR-TR-2014-59 (2014)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, People’s Republic of China
Liang Yao, Yin Zhang, Baogang Wei, Hongze Qian & Yibing Wang

Authors

Liang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baogang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hongze Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yibing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin Zhang .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
The University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, L., Zhang, Y., Wei, B., Qian, H., Wang, Y. (2015). Incorporating Probabilistic Knowledge into Topic Models. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-18032-8_46
Published: 09 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics