Abstract
Retrieval accuracy can be improved by considering which document type should be filtered out and which should be ranked higher in the result list. Hence, document type can be used as a key factor for building a re-ranking retrieval model. We take a simple approach for considering document type in the retrieval process. We adapt the BM25 scoring function to weight term frequency based on the document type and take the Bayesian approach to estimate the appropriate weight for each type. Experimental results show that our approach improves on search precision by as much as 19%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Craswell, N., Soboroff, I., de Vries, A.: Overview of the trec-2006 enterprise track (to be published in 2006)
Robertson, S.E., et al.: Okapi at trec-3. In: Proceedings of Text REtrieval Conference (November 1994), http://citeseer.ist.psu.edu/robertson96okapi.html
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM ’04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, D.C., USA, pp. 42–49. ACM Press, New York (2004), doi:10.1145/1031171.1031181
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Yeung, P.C.K., Büttcher, S., Clarke, C.L.A., Kolla, M. (2007). A Bayesian Approach for Learning Document Type Relevance. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_85
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)