Skip to main content

Finding Co-occurring Topics in Wikipedia Article Segments

  • Conference paper
The Emergence of Digital Libraries – Research and Practices (ICADL 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8839))

Included in the following conference series:

Abstract

Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.J.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden markov model. In: Proceedings of SIGIR (2001)

    Google Scholar 

  3. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: SIGIR 2001, pp. 120–127 (2001)

    Google Scholar 

  4. Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. 27th International ACM SIGIRConf. Research and Development Information Retrieval, pp. 186–193 (2004)

    Google Scholar 

  5. Xing, W., Croft, W.B.: LDA-Based Document Models for Ad-hoc Retrieval. In: Proc. 29thACM SIGIR Conf., pp. 178–185 (2006)

    Google Scholar 

  6. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. 24th ACM SIGIR 2001, pp. 334–34 (2001)

    Google Scholar 

  7. Evgeniy, G., Shaul, M.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proc. IJCAI 2007 Proceedings of the 20th International Joint Conference on Artifical Intelligence, San Francisco, pp. 1606–1611 (2007)

    Google Scholar 

  8. David, M., Ian, H.W.: An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, Chicago, pp. 25–30 (2008)

    Google Scholar 

  9. http://www.dmoz.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, R., Wu, J., Iwaihara, M. (2014). Finding Co-occurring Topics in Wikipedia Article Segments. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12823-8_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12822-1

  • Online ISBN: 978-3-319-12823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics