Skip to main content

Searching Documents Based on Relevance and Type

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

This paper extends previous work on document retrieval and document type classification, addressing the problem of ’typed search’. Specifically, given a query and a designated document type, the search system retrieves and ranks documents not only based on the relevance to the query, but also based on the likelihood of being the designated document type. The paper formalizes the problem in a general framework consisting of ’relevance model’ and ’type model’. The relevance model indicates whether or not a document is relevant to a query. The type model indicates whether or not a document belongs to the designated document type. We consider three methods for combing the models: linear combination of scores, thresholding on the type score, and a hybrid of the previous two methods. We take course page search and instruction document search as examples and have conducted a series of experiments. Experimental results show our proposed approaches can significantly outperform the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Craswell, N., Hawking, D., Robertson, S.E.: Effective site finding using link anchor information. In: Proc. of the 24th SIGIR conference, New Orleans, USA, pp. 250–257 (2001)

    Google Scholar 

  2. Craswell, N., Hawking, D.: Overview of the TREC-2004 Web Track. In: Thirteen Text REtrieval Conference. NIST Special Publication: 500-261 (2004)

    Google Scholar 

  3. Craswell, N., Vries, A., Soboroff, I.: Overview of the TREC-2005 Enterprise Track. In: The Fourteenth Text Retrieval Conference (2005)

    Google Scholar 

  4. Freund, L., Toms, E.G., Clarke, C.L.: Modeling task-genre relationships for IR in the workplace. In: Proc. of the 28th ACM SIGIR Conference, Salvador, Brazil, ACM Press, New York (2005)

    Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)

    MATH  Google Scholar 

  6. Kessler, B., Nunberg, G., Schutze, H.: Automatic Detection of Text Genre. In: Proc. of the 35th Association for Computational Linguistics, Madrid, Spain, pp. 32–38 (1997)

    Google Scholar 

  7. Kraajj, W., Westerveld, T., Hiemstra, D.: The Importance of Prior Probabilities for Entry Page Search. In: Proc. of the 25th ACM SIGIR conference, ACM Press, New York (2002)

    Google Scholar 

  8. Mizzaro, S.: Relevance: The whole history. Journal of the American Society for Information Science 48(9), 810–832 (1997)

    Article  Google Scholar 

  9. Mizzaro, S.: How many relevancies in information retrieval? Interacting With Computers 10(3), 305–322 (1998)

    Article  Google Scholar 

  10. Rauber, A., Müller-Kögler, A.: Integrating automatic genre analysis into digital libraries. In: Proc. of the 1st ACM/IEEE Joint Conf. on Digital Libraries, Virginia, USA, pp. 1–10. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  11. Robertson, S.E., et al.: Okapi at TREC-4. In: Proc. of the 4th Text REtrieval Conference, pp. 73–96 (1996)

    Google Scholar 

  12. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  13. Song, R., et al.: Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004. In: 13th Text REtrieval Conference. NIST Special Publication: 500-261 (2004)

    Google Scholar 

  14. Voorhees, E.: Overview of the TREC 2003 Question Answering Track. In: Proc. of the 12th Annual Text Retrieval Conference (2003)

    Google Scholar 

  15. Zaragoza, H., et al.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: 13th Text REtrieval Conference (2004)

    Google Scholar 

  16. Matsuda, K., Fukushima, T.: Task Oriented World Wide Web Retrieval by Document Type Classification. In: Proc. of the 8th CIKM, Kansas, USA (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Xu, J., Cao, Y., Li, H., Craswell, N., Huang, Y. (2007). Searching Documents Based on Relevance and Type. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics