CMIC@INEX 2008: Link-the-Wiki Track

Darwish, Kareem

doi:10.1007/978-3-642-03761-0_34

Kareem Darwish¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5631))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

394 Accesses

Abstract

This paper describes the runs that I submitted to the INEX 2008 Link-the-Wiki track. I participated in the incoming File-to-File and the outgoing Anchor-to-BEP tasks. For the File-to-File task I used a generic IR engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. Dissertation, Institut für maschinelle Sprachverarbeitung, University of Stuttgart (2004)
Google Scholar
Hammouda, K., Kamel, M.: Efficient Phrase-based Document Indexing for Web Document Clustering. IEEE Transactions on Knowledge and Data Engineering 16(10), 1279–1296 (2004)
Article Google Scholar
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40(5), 735–750 (2004)
Google Scholar
Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of IJCAI, Acapulco, Mexico, pp. 434–439 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Cairo Microsoft Innovation Center, Bldg B115, Smart Village Km. 28 Cairo/Alexandria Desert Rd., Abu Rawash, Egypt
Kareem Darwish

Authors

Kareem Darwish
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, 4001, Brisband, Qld, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Department of Computer Science, University of Otago, P.O. Box 56, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Darwish, K. (2009). CMIC@INEX 2008: Link-the-Wiki Track. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-03761-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics