Query-Based Inter-document Similarity Using Probabilistic Co-relevance Model

Na, Seung-Hoon; Kang, In-Su; Lee, Jong-Hyeok

doi:10.1007/978-3-540-78646-7_79

Seung-Hoon Na¹,
In-Su Kang² &
Jong-Hyeok Lee¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

European Conference on Information Retrieval

2159 Accesses

Abstract

Inter-document similarity is the critical information which determines whether or not the cluster-based retrieval improves the baseline. However, a theoretical work on inter-document similarity has not been investigated, even though such work can provide a principle to define a more improved similarity in a well-motivated direction. To support this theory, this paper starts from pursuing an ideal inter-document similarity that optimally satisfies the cluster-hypothesis. We propose a probabilistic principle of inter-document similarities; the optimal similarity of two documents should be proportional to the probability that they are co-relevant to an arbitrary query. Based on this principle, the study of the inter-document similarity is formulated to attack the estimation problem of the co-relevance model of documents. Furthermore, we obtain that the optimal inter-document similarity should be defined using queries as its basic unit, not terms, namely a query-based similarity. We strictly derive a novel query-based similarity from the co-relevance model, without any heuristics. Experimental results show that the new query-based inter-document similarity significantly improves the previously-used term-based similarity in the context of Voorhee’s evaluation measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rijsbergen, C.J.V.: Information Retrieval. Butterworth-Heinemann (1979)
Google Scholar
Croft, W.B.: A model of cluster searching based on classification. Information Systems (5), 189–195 (1980)
Article Google Scholar
Voorhees, E.M.: The cluster hypothesis revisited. In: SIGIR 1985, pp. 188–196 (1985)
Google Scholar
Liu, X.: Cluster-based retrieval using language models. In: SIGIR 2004, pp. 186–193 (2004)
Google Scholar
Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: SIGIR 2004, pp. 194–201 (2004)
Google Scholar
Roelleke, T., Wang, J.: A parallel derivation of probabilistic information retrieval models. In: SIGIR 2006, pp. 107–114 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

POSTECH, Pohang, South Korea
Seung-Hoon Na & Jong-Hyeok Lee
KISTI, Daejeon, South Korea
In-Su Kang

Authors

Seung-Hoon Na
View author publications
You can also search for this author in PubMed Google Scholar
In-Su Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Na, SH., Kang, IS., Lee, JH. (2008). Query-Based Inter-document Similarity Using Probabilistic Co-relevance Model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_79

Download citation

DOI: https://doi.org/10.1007/978-3-540-78646-7_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics