Completely-Arbitrary Passage Retrieval in Language Modeling Approach

Na, Seung-Hoon; Kang, In-Su; Lee, Ye-Ha; Lee, Jong-Hyeok

doi:10.1007/978-3-540-68636-1_3

Seung-Hoon Na¹,
In-Su Kang²,
Ye-Ha Lee¹ &
…
Jong-Hyeok Lee¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Asia Information Retrieval Symposium

1465 Accesses

Abstract

Passage retrieval has been expected to be an alternative method to resolve length-normalization problem, since passages have more uniform lengths and topics, than documents. An important issue in the passage retrieval is to determine the type of the passage. Among several different passage types, the arbitrary passage type which dynamically varies according to query has shown the best performance. However, the previous arbitrary passage type is not fully examined, since it still uses the fixed-length restriction such as n consequent words. This paper proposes a new type of passage, namely completely-arbitrary passages by eliminating all possible restrictions of passage on both lengths and starting positions, and by extremely relaxing the type of the original arbitrary passage. The main advantage using completely-arbitrary passages is that the proximity feature of query terms can be well-supported in the passage retrieval, while the non-completely arbitrary passage cannot clearly support. Experimental result extensively shows that the passage retrieval using the completely-arbitrary passage significantly improves the document retrieval, as well as the passage retrieval using previous non-completely arbitrary passages, on six standard TREC test collections, in the context of language modeling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Neural Passage Model for Ad-hoc Document Retrieval

Investigation of Passage Based Ranking Models to Improve Document Retrieval

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Article 30 July 2022

References

Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: SIGIR 1996: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 21–29 (1996)
Google Scholar
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR 1994: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 232–241 (1994)
Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275–281 (1998)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 334–342 (2001)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49–58 (1993)
Google Scholar
Callan, J.: Passage-level evidence in document retrieval. In: SIGIR 1994: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 302–310. Springer-Verlag New York, Inc., New York (1994)
Google Scholar
Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology (JASIST) 52(4), 344–364 (2001)
Article Google Scholar
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: CIKM 2002: Proceedings of the eleventh international conference on Information and knowledge management, pp. 375–382 (2002)
Google Scholar
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 59–68 (1993)
Google Scholar
Clarke, C.L.A., Cormack, G.V., Tudhope, E.A.: Relevance ranking for one to three term queries. Inf. Process. Manage. 36(2), 291–311 (2000)
Article Google Scholar
Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 295–302 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Compueter Science, POSTECH, AITrc, Republic of Korea
Seung-Hoon Na, Ye-Ha Lee & Jong-Hyeok Lee
Korea Institute of Science and Technology Information(KISTI), Republic of Korea
In-Su Kang

Authors

Seung-Hoon Na
View author publications
You can also search for this author in PubMed Google Scholar
In-Su Kang
View author publications
You can also search for this author in PubMed Google Scholar
Ye-Ha Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Na, SH., Kang, IS., Lee, YH., Lee, JH. (2008). Completely-Arbitrary Passage Retrieval in Language Modeling Approach. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-68636-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics