Abstract
Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.J.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden markov model. In: Proceedings of SIGIR (2001)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: SIGIR 2001, pp. 120–127 (2001)
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. 27th International ACM SIGIRConf. Research and Development Information Retrieval, pp. 186–193 (2004)
Xing, W., Croft, W.B.: LDA-Based Document Models for Ad-hoc Retrieval. In: Proc. 29thACM SIGIR Conf., pp. 178–185 (2006)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. 24th ACM SIGIR 2001, pp. 334–34 (2001)
Evgeniy, G., Shaul, M.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proc. IJCAI 2007 Proceedings of the 20th International Joint Conference on Artifical Intelligence, San Francisco, pp. 1606–1611 (2007)
David, M., Ian, H.W.: An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, Chicago, pp. 25–30 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, R., Wu, J., Iwaihara, M. (2014). Finding Co-occurring Topics in Wikipedia Article Segments. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds) The Emergence of Digital Libraries – Research and Practices. ICADL 2014. Lecture Notes in Computer Science, vol 8839. Springer, Cham. https://doi.org/10.1007/978-3-319-12823-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-12823-8_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12822-1
Online ISBN: 978-3-319-12823-8
eBook Packages: Computer ScienceComputer Science (R0)