Topic Difficulty Prediction in Entity Ranking

Vercoustre, Anne-Marie; Pehcevski, Jovan; Naumovski, Vladimir

doi:10.1007/978-3-642-03761-0_29

Anne-Marie Vercoustre¹⁹,
Jovan Pehcevski²⁰ &
Vladimir Naumovski²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5631))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

432 Accesses
11 Citations

Abstract

Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Carmel, D., Yom-Tov, E., Soboroff, I.: Predicting query difficulty - methods and applications. SIGIR Forum 39(2), 25–28 (2005)
Article Google Scholar
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2002), Tampere, Finland, pp. 299–306 (2002)
Google Scholar
de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)
Chapter Google Scholar
Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631. Springer, Heidelberg (2009)
Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)
Article Google Scholar
Grivolla, J., Jourlin, P., de Mori, R.: Automatic classification of queries by expected retrieval performance. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
Google Scholar
He, B., Ounis, I.: Query performance prediction. Information Systems 31(7), 585–594 (2006)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Kwok, K.: An attempt to identify weakest and strongest queries. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
Google Scholar
Lang, H., Wang, B., Jones, G., Li, J.-T., Ding, F., Liu, Y.-X.: Query performance prediction for information retrieval based on covering topic score. Journal of Computer Science and technology 23(4), 590–601 (2008)
Article Google Scholar
Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, Philadelphia, Pennsylvania, pp. 63–70 (2002)
Google Scholar
Mizzaro, S.: The good, the bad, the difficult, and the easy: Something wrong with information retrieval evaluation? In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 642–646. Springer, Heidelberg (2008)
Chapter Google Scholar
Mizzaro, S., Robertson, S.: HITS hits TREC: Exploring IR evaluation results with network analysis. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 479–486 (2007)
Google Scholar
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
Google Scholar
Pehcevski, J., Vercoustre, A.-M., Thom, J.A.: Exploiting locality of Wikipedia links in entity ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)
Chapter Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Thom, J.A., Pehcevski, J., Vercoustre, A.-M.: Use of Wikipedia categories in entity ranking. In: Proceedings of 12th Australasian Document Computing Symposium (ADCS 2007), Melbourne, Australia, pp. 56–63 (2007)
Google Scholar
Voorhees, E.M.: The TREC robust retrieval track. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)
Google Scholar
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of the 31st ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008), Singapore, pp. 51–58 (2008)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2/E). Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Yom-Tov, E., Fine, S., Carmel, D., Darlow, A., Amitay, E.: Juru at TREC 2004: Experiments with prediction of query difficulty. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)
Google Scholar
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 543–550 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Rocquencourt, France
Anne-Marie Vercoustre
Faculty of Management and Information Technologies, Skopje, Macedonia
Jovan Pehcevski & Vladimir Naumovski

Authors

Anne-Marie Vercoustre
View author publications
You can also search for this author in PubMed Google Scholar
Jovan Pehcevski
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Naumovski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, 4001, Brisband, Qld, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Department of Computer Science, University of Otago, P.O. Box 56, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vercoustre, AM., Pehcevski, J., Naumovski, V. (2009). Topic Difficulty Prediction in Entity Ranking. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-03761-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics