Skip to main content

Investigation of Passage Based Ranking Models to Improve Document Retrieval

  • Conference paper
  • First Online:
  • 334 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 976))

Abstract

Passage retrieval deals with identifying and retrieving small but explanatory portions of a document that answers a user’s query. In this paper, we focus on improving the document ranking by using different passage based evidence. Several similarity measures were evaluated and a more in-depth analysis was undertaken into the effect of varying specific. We have also explored the notion of query difficulty to understand whether the best performing passage-based approach helps to improve, or not, the performance of certain queries. Experimental results indicate that for the passage level technique, the worst-performing queries are damaged slightly and the those that perform well are boosted for the WebAp collection. However, our rank-based similarity function boosted the performance of the difficult queries in the Ohsumed collection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://ciir.cs.umass.edu/downloads/WebAP/.

  2. 2.

    http://lucene.apache.org/solr/5_2_1/index.html.

  3. 3.

    http://lucene.apache.org/.

  4. 4.

    http://www.ranks.nl/stopwords.

References

  1. Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends ® Inf. Retr. 3, 333–389 (2009)

    Article  Google Scholar 

  2. Roberts, I., Gaizauskas, R.: Evaluating passage retrieval approaches for question answering. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 72–84. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24752-4_6

    Chapter  Google Scholar 

  3. Sarwar, G., O’Riordan, C., Newell, J.: Passage level evidence for effective document level retrieval. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 83–90 (2017)

    Google Scholar 

  4. Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 192–201. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_20

    Chapter  Google Scholar 

  5. Callan, J.P.: Passage-level evidence in document retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 302–310. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_31

    Chapter  Google Scholar 

  6. Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23, 33–64 (1997)

    Google Scholar 

  7. Bendersky, M., Kurland, O.: Utilizing passage-based language models for document retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 162–174. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_17

    Chapter  Google Scholar 

  8. Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. J. Am. Soc. Inf. Sci. Technol. 52, 344–364 (2001)

    Article  Google Scholar 

  9. Clarke, C.L., Cormack, G.V., Lynam, T.R., Terra, E.L.: Question answering by passage selection. In: Strzalkowski, T., Harabagiu, S.M. (eds.) Advances in Open Domain Question Answering, pp. 259–283. Springer, Dordrecht (2008). https://doi.org/10.1007/978-1-4020-4746-6_8

    Chapter  Google Scholar 

  10. Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 375–382. ACM (2002)

    Google Scholar 

  11. Jong, M.H., Ri, C.H., Choe, H.C., Hwang, C.J.: A method of passage-based document retrieval in question answering system. arXiv preprint arXiv:1512.05437 (2015)

  12. Sarwar, G., O’Riordan, C., Newell, J.: Passage level evidence for effective document level retrieval (2017)

    Google Scholar 

  13. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart: TREC 3. NIST Special Publication SP, p. 69 (1995)

    Google Scholar 

  14. Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–68. ACM (1993)

    Google Scholar 

  15. Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58. ACM (1993)

    Google Scholar 

  16. Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and development in Information Retrieval, pp. 120–127. ACM (2001)

    Google Scholar 

  17. Krikon, E., Kurland, O., Bendersky, M.: Utilizing inter-passage and inter-document similarities for reranking search results. ACM Trans. Inf.Syst. (TOIS) 29, 3 (2010)

    Article  Google Scholar 

  18. Bendersky, M., Kurland, O.: Re-ranking search results using document-passage graphs. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 853–854. ACM (2008)

    Google Scholar 

  19. Ai, Q., O’Connor, B., Croft, W.B.: A neural passage model for ad-hoc document retrieval. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 537–543. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_41

    Chapter  Google Scholar 

  20. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)

    Google Scholar 

  21. Galkó, F., Eickhoff, C.: Biomedical question answering via weighted neural network passage retrieval. arXiv preprint arXiv:1801.02832 (2018)

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  23. Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty

    Google Scholar 

  24. He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5

    Chapter  MATH  Google Scholar 

  25. Lashkari, A.H., Mahdavi, F., Ghomi, V.: A boolean model in information retrieval for search engines. In: International Conference on Information Management and Engineering, ICIME 2009, pp. 385–389. IEEE (2009)

    Google Scholar 

  26. Keikha, M., Park, J.H., Croft, W.B., Sanderson, M.: Retrieving passages and finding answers. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 81. ACM (2014)

    Google Scholar 

  27. Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)

    Google Scholar 

  28. Yang, L., et al.: Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 115–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_9

    Chapter  Google Scholar 

  29. He, J., Larson, M., de Rijke, M.: Using coherence-based measures to predict query difficulty. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 689–694. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_80

    Chapter  Google Scholar 

  30. Cummins, R., Jose, J., O’Riordan, C.: Improved query performance prediction using standard deviation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1089–1090. ACM, New York (2011)

    Google Scholar 

  31. Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.: On ranking the effectiveness of searches. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 398–404. ACM, New York (2006)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Irish Research Council Employment Based Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ghulam Sarwar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarwar, G., O’Riordan, C., Newell, J. (2019). Investigation of Passage Based Ranking Models to Improve Document Retrieval. In: Fred, A., et al. Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2017. Communications in Computer and Information Science, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-15640-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15640-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15639-8

  • Online ISBN: 978-3-030-15640-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics