Abstract
Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Carmel, D., Yom-Tov, E., Soboroff, I.: Predicting query difficulty - methods and applications. SIGIR Forum 39(2), 25–28 (2005)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2002), Tampere, Finland, pp. 299–306 (2002)
de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)
Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631. Springer, Heidelberg (2009)
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)
Grivolla, J., Jourlin, P., de Mori, R.: Automatic classification of queries by expected retrieval performance. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
He, B., Ounis, I.: Query performance prediction. Information Systems 31(7), 585–594 (2006)
Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Kwok, K.: An attempt to identify weakest and strongest queries. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
Lang, H., Wang, B., Jones, G., Li, J.-T., Ding, F., Liu, Y.-X.: Query performance prediction for information retrieval based on covering topic score. Journal of Computer Science and technology 23(4), 590–601 (2008)
Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, Philadelphia, Pennsylvania, pp. 63–70 (2002)
Mizzaro, S.: The good, the bad, the difficult, and the easy: Something wrong with information retrieval evaluation? In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 642–646. Springer, Heidelberg (2008)
Mizzaro, S., Robertson, S.: HITS hits TREC: Exploring IR evaluation results with network analysis. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 479–486 (2007)
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)
Pehcevski, J., Vercoustre, A.-M., Thom, J.A.: Exploiting locality of Wikipedia links in entity ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Thom, J.A., Pehcevski, J., Vercoustre, A.-M.: Use of Wikipedia categories in entity ranking. In: Proceedings of 12th Australasian Document Computing Symposium (ADCS 2007), Melbourne, Australia, pp. 56–63 (2007)
Voorhees, E.M.: The TREC robust retrieval track. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of the 31st ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008), Singapore, pp. 51–58 (2008)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2/E). Morgan Kaufmann, San Francisco (2005)
Yom-Tov, E., Fine, S., Carmel, D., Darlow, A., Amitay, E.: Juru at TREC 2004: Experiments with prediction of query difficulty. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 543–550 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vercoustre, AM., Pehcevski, J., Naumovski, V. (2009). Topic Difficulty Prediction in Entity Ranking. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)