Abstract
We describe our participation in the WebCLEF 2007 task, targeted at snippet retrieval from web data. Our system ranks snippets based on a simple similarity-based centrality, inspired by the web page ranking algorithms. We experimented with retrieval units (sentences and paragraphs) and with the similarity functions used for centrality computations (word overlap and cosine similarity). We found that using paragraphs with the cosine similarity function shows the best performance with precision around 20% and recall around 25% according to human assessments of the first 7,000 bytes of responses for individual topics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Allan, J., Wade, C., Bolivar, A.: Retrieval and novelty detection at the sentence level. In: SIGIR 2003, pp. 314–321 (2003)
Adafre, S.F., Jijkouni, V., de Rijke, M.: Fact discovery in Wikipedia. In: IEEE/WIC/ACM International Conference on Web Intelligence 2007 (2007)
Jijkoun, V., de Rijke, M.: Recognizing textual entailment: Is lexical similarity enough? In: Dagan, I., Dalche, F., Quinonero Candela, J., Magnini, B. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 449–460. Springer, Heidelberg (2006)
Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jijkoun, V., de Rijke, M. (2008). Using Centrality to Rank Web Snippets. In: Peters, C., et al. Advances in Multilingual and Multimodal Information Retrieval. CLEF 2007. Lecture Notes in Computer Science, vol 5152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85760-0_94
Download citation
DOI: https://doi.org/10.1007/978-3-540-85760-0_94
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85759-4
Online ISBN: 978-3-540-85760-0
eBook Packages: Computer ScienceComputer Science (R0)