ABSTRACT
We address a specific enterprise document search scenario, where the information need is expressed in an elaborate manner. In our scenario, information needs are expressed using a short query (of a few keywords) together with examples of key reference pages. Given this setup, we investigate how the examples can be utilized to improve the end-to-end performance on the document retrieval task. Our approach is based on a language modeling framework, where the query model is modified to resemble the example pages. We compare several methods for sampling expansion terms from the example pages to support query-dependent and query-independent query expansion; the latter is motivated by the wish to increase "aspect recall", and attempts to uncover aspects of the information need not captured by the query.
For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.
- P. Bailey, D. Agrawal, and A. Kumar. TREC 2007 Enterprise Track at CSIRO. In TREC 2007 Working Notes, 2007.Google Scholar
- P. Bailey, N. Craswell, A. P. De Vries, and I. Soboroff. Overview of the TREC 2007 Enterprise Track. In TREC 2007 Working Notes, 2007.Google Scholar
- P. Bailey, N. Craswell, N. Soboroff, and A. de Vries. The CSIRO enterprise search test collection. ACM SIGIR Forum, 41, 2007. Google ScholarDigital Library
- K. Balog, K. Hofmann, W. Weerkamp, and M. de Rijke. The University of Amsterdam at the TREC 2007 Enterprise Track. In TREC 2007 Working Notes, 2007.Google Scholar
- C. Buckley. Why current IR engines fail. In SIGIR '04, pages 584--585, 2004. Google ScholarDigital Library
- Y. Fu, Y. Xue, T. Zhu, Y. Liu, M. Zhang, and S. Ma. THUIR at TREC 2007: Enterprise Track. In TREC 2007 Working Notes, 2007.Google Scholar
- D. Hannah, C. Macdonald, J. Peng, B. He, and I. Ounis. University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier. In TREC 2007 Working Notes, 2007.Google Scholar
- D. Harman and C. Buckley. The NRRC reliable information access (RIA) workshop. In SIGIR '04, pages 528--529, 2004. Google ScholarDigital Library
- D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, University of Twente, 2001.Google Scholar
- D. Hiemstra, S. Robertson, and H. Zaragoza. Parsimonious language models for information retrieval. In SIGIR '04, pages 178--185, 2004. Google ScholarDigital Library
- H. Joshi, S. D. Sudarsan, S. Duttachowdhury, C. Zhang, and S. Ramasway. UALR at TREC-ENT 2007. In TREC 2007 Working Notes, 2007.Google Scholar
- O. Kurland, L. Lee, and C. Domshlak. Better than the real thing? In SIGIR '05, pages 19--26, 2005. Google ScholarDigital Library
- J. Lafferty and C. Zhai. Probabilistic relevance models based on document and query generation. In Language Modeling for Information Retrieval. Springer, 2003.Google ScholarCross Ref
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01, pages 111--119, 2001. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01, pages 120--127, 2001. Google ScholarDigital Library
- D. Miller, T. Leek, and R. Schwartz. A hidden Markov model information retrieval system. In SIGIR '99, pages 214--221, 1999. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98, pages 275--281, 1998. Google ScholarDigital Library
- Y. Qiu and H.-P. Frei. Concept based query expansion. In SIGIR '93, pages 160--169, 1993. Google ScholarDigital Library
- J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, 1971.Google Scholar
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986. Google ScholarDigital Library
- H. Shen, G. Chen, H. Chen, Y. Liu, and X. Cheng. Research on Enterprise Track of TREC 2007. In TREC 2007 Working Notes, 2007.Google Scholar
- F. Song and W. B. Croft. A general language model for information retrieval. In CIKM '99, pages 316--321, 1999. Google ScholarDigital Library
- K. Sparck Jones, S. E. Robertson, D. Hiemstra, and H. Zaragoza. Language modelling and relevance. InW. B. Croft and J. Lafferty, editors, Language Modeling for Information Retrieval, pages 57--71. 2003.Google ScholarCross Ref
- T. Tao and C. Zhai. Regularized estimation of mixture models for robust pseudo-relevance feedback. In SIGIR '06, pages 162--169, 2006. Google ScholarDigital Library
- R. Yan and A. Hauptmann. Query expansion using probabilistic local feedback with application to multimedia retrieval. In CIKM '07, pages 361--370, 2007. Google ScholarDigital Library
- C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410. ACM, 2001. Google ScholarDigital Library
Index Terms
- A few examples go a long way: constructing query models from elaborate query formulations
Recommendations
Context-aware query expansion method using Language Models and Latent Semantic Analyses
One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user's queries by additional context. We used the Language Model to ...
Query modeling for entity search based on terms, categories, and examples
Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for ...
Improving Web Page Retrieval Using Search Context from Clicked Domain Names
DEXA '09: Proceedings of the 2009 20th International Workshop on Database and Expert Systems ApplicationSearch context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data ...
Comments