Skip to main content

Topic Difficulty Prediction in Entity Ranking

  • Conference paper
Book cover Advances in Focused Retrieval (INEX 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5631))

Abstract

Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  2. Carmel, D., Yom-Tov, E., Soboroff, I.: Predicting query difficulty - methods and applications. SIGIR Forum 39(2), 25–28 (2005)

    Article  Google Scholar 

  3. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2002), Tampere, Finland, pp. 299–306 (2002)

    Google Scholar 

  4. de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631. Springer, Heidelberg (2009)

    Google Scholar 

  6. Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)

    Article  Google Scholar 

  7. Grivolla, J., Jourlin, P., de Mori, R.: Automatic classification of queries by expected retrieval performance. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)

    Google Scholar 

  8. He, B., Ounis, I.: Query performance prediction. Information Systems 31(7), 585–594 (2006)

    Article  Google Scholar 

  9. Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kwok, K.: An attempt to identify weakest and strongest queries. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)

    Google Scholar 

  11. Lang, H., Wang, B., Jones, G., Li, J.-T., Ding, F., Liu, Y.-X.: Query performance prediction for information retrieval based on covering topic score. Journal of Computer Science and technology 23(4), 590–601 (2008)

    Article  Google Scholar 

  12. Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, Philadelphia, Pennsylvania, pp. 63–70 (2002)

    Google Scholar 

  13. Mizzaro, S.: The good, the bad, the difficult, and the easy: Something wrong with information retrieval evaluation? In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 642–646. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Mizzaro, S., Robertson, S.: HITS hits TREC: Exploring IR evaluation results with network analysis. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 479–486 (2007)

    Google Scholar 

  15. Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)

    Google Scholar 

  16. Pehcevski, J., Vercoustre, A.-M., Thom, J.A.: Exploiting locality of Wikipedia links in entity ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  18. Thom, J.A., Pehcevski, J., Vercoustre, A.-M.: Use of Wikipedia categories in entity ranking. In: Proceedings of 12th Australasian Document Computing Symposium (ADCS 2007), Melbourne, Australia, pp. 56–63 (2007)

    Google Scholar 

  19. Voorhees, E.M.: The TREC robust retrieval track. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)

    Google Scholar 

  20. Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of the 31st ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008), Singapore, pp. 51–58 (2008)

    Google Scholar 

  21. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2/E). Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  22. Yom-Tov, E., Fine, S., Carmel, D., Darlow, A., Amitay, E.: Juru at TREC 2004: Experiments with prediction of query difficulty. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)

    Google Scholar 

  23. Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 543–550 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vercoustre, AM., Pehcevski, J., Naumovski, V. (2009). Topic Difficulty Prediction in Entity Ranking. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03761-0_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03760-3

  • Online ISBN: 978-3-642-03761-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics