Skip to main content

Probabilistic Topic Modelling with Semantic Graph

  • Conference paper
Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowledge, respectively. Given the semantic graph representation of the documents, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wiki.dbpedia.org/.

  2. 2.

    https://developers.google.com/freebase/.

  3. 3.

    https://github.com/dbpedia-spotlight/dbpedia-spotlight.

  4. 4.

    http://www.informatik.uni-trier.de/~ley/db/.

  5. 5.

    http://qwone.com/~jason/20Newsgroups/.

References

  1. Bao, Y., Collier, N., Datta, A.: A partially supervised cross-collection topic model for cross-domain text classification. In: CIKM 2013, pp. 239–248 (2013)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. 3, 459–565

    Google Scholar 

  3. Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-scale question classification in cqa by leveraging wikipedia semantic knowledge. In: CIKM 2011, pp. 1321–1330 (2011)

    Google Scholar 

  4. Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: KDD 2012, pp. 96–104 (2012)

    Google Scholar 

  5. Deng, H., Han, J., Zhao, B., Yintao, Y., Lin, C.X.: Probabilistic topic models with biased propagation on heterogeneous information networks. In: KDD 2011, pp. 1271–1279 (2011)

    Google Scholar 

  6. Guo, W., Diab, M.: Semantic topic models: Combining word distributional statistics and dictionary definitions. In: EMNLP 2011, pp. 552–561 (2011)

    Google Scholar 

  7. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 45, 256–269

    Google Scholar 

  8. Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: KDD 2011, pp. 832–840 (2011)

    Google Scholar 

  9. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. WSDM 2013, pp. 465–474 (2013)

    Google Scholar 

  10. Kim, H., Sun, Y., Hockenmaier, J., Han, J.: Etm: Entity topic models for mining documents associated with entities. In: ICDM 2012, pp. 349–358 (2012)

    Google Scholar 

  11. Li, F., He, T., Xinhui, T., Xiaohua, H.: Incorporating word correlation into tag-topic model for semantic knowledge acquisition. In: CIKM 2012, pp. 1622–1626 (2012)

    Google Scholar 

  12. Li, H., Li, Z., Lee, W.-C., Lee, D.L.: A probabilistic topic-based ranking framework for location-sensitive domain information retrieval. In: SIGIR 2009, pp. 331–338 (2009)

    Google Scholar 

  13. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: WWW 2008, pp. 342–351 (2008)

    Google Scholar 

  14. Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: WSDM 2014, pp. 543–552 (2014)

    Google Scholar 

  15. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Zhong, S.: Arnetminer: extraction and mining of academic social networks. In: KDD 2008, pp. 428–437 (2008)

    Google Scholar 

  16. Xing Wei, W., Croft, B.: Lda-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 326–335 (2009)

    Google Scholar 

  17. Wei, X., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: SIGIR 2003, pp. 267–273 (2003)

    Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewer for their helpful comments. We acknowledge support from the EPSRC funded project named A Situation Aware Information Infrastructure Project (EP/L026015) and the Integrated Multimedia City Data (IMCD), a project within the ESRC-funded Urban Big Data Centre (ES/L011921/1). This work was also partly supported by NSF grant #61572223. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the sponsor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, L., Jose, J.M., Yu, H., Yuan, F., Zhang, H. (2016). Probabilistic Topic Modelling with Semantic Graph. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics