Abstract
This paper extends previous work on document retrieval and document type classification, addressing the problem of ’typed search’. Specifically, given a query and a designated document type, the search system retrieves and ranks documents not only based on the relevance to the query, but also based on the likelihood of being the designated document type. The paper formalizes the problem in a general framework consisting of ’relevance model’ and ’type model’. The relevance model indicates whether or not a document is relevant to a query. The type model indicates whether or not a document belongs to the designated document type. We consider three methods for combing the models: linear combination of scores, thresholding on the type score, and a hybrid of the previous two methods. We take course page search and instruction document search as examples and have conducted a series of experiments. Experimental results show our proposed approaches can significantly outperform the baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Craswell, N., Hawking, D., Robertson, S.E.: Effective site finding using link anchor information. In: Proc. of the 24th SIGIR conference, New Orleans, USA, pp. 250–257 (2001)
Craswell, N., Hawking, D.: Overview of the TREC-2004 Web Track. In: Thirteen Text REtrieval Conference. NIST Special Publication: 500-261 (2004)
Craswell, N., Vries, A., Soboroff, I.: Overview of the TREC-2005 Enterprise Track. In: The Fourteenth Text Retrieval Conference (2005)
Freund, L., Toms, E.G., Clarke, C.L.: Modeling task-genre relationships for IR in the workplace. In: Proc. of the 28th ACM SIGIR Conference, Salvador, Brazil, ACM Press, New York (2005)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Kessler, B., Nunberg, G., Schutze, H.: Automatic Detection of Text Genre. In: Proc. of the 35th Association for Computational Linguistics, Madrid, Spain, pp. 32–38 (1997)
Kraajj, W., Westerveld, T., Hiemstra, D.: The Importance of Prior Probabilities for Entry Page Search. In: Proc. of the 25th ACM SIGIR conference, ACM Press, New York (2002)
Mizzaro, S.: Relevance: The whole history. Journal of the American Society for Information Science 48(9), 810–832 (1997)
Mizzaro, S.: How many relevancies in information retrieval? Interacting With Computers 10(3), 305–322 (1998)
Rauber, A., Müller-Kögler, A.: Integrating automatic genre analysis into digital libraries. In: Proc. of the 1st ACM/IEEE Joint Conf. on Digital Libraries, Virginia, USA, pp. 1–10. IEEE Computer Society Press, Los Alamitos (2001)
Robertson, S.E., et al.: Okapi at TREC-4. In: Proc. of the 4th Text REtrieval Conference, pp. 73–96 (1996)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Song, R., et al.: Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004. In: 13th Text REtrieval Conference. NIST Special Publication: 500-261 (2004)
Voorhees, E.: Overview of the TREC 2003 Question Answering Track. In: Proc. of the 12th Annual Text Retrieval Conference (2003)
Zaragoza, H., et al.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: 13th Text REtrieval Conference (2004)
Matsuda, K., Fukushima, T.: Task Oriented World Wide Web Retrieval by Document Type Classification. In: Proc. of the 8th CIKM, Kansas, USA (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Xu, J., Cao, Y., Li, H., Craswell, N., Huang, Y. (2007). Searching Documents Based on Relevance and Type. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)