Abstract
Pseudo-relevance feedback has shown to be an effective method in many information retrieval tasks. Various criteria have been proposed to rank terms extracted from the top ranked document of the initial retrieval results. However, most existing methods extract terms individually and do not consider the impacts of relationships among terms and their combinations. In this study, we first re-examine this assumption and show that combinations of terms may heavily impact the final results. We then present a novel clustering based method to select expansion terms as a whole set. The main idea is to use first simultaneously cluster terms and documents using non-negative matrix factorization, and then use the Maximum Relevance and Minimum Redundancy criteria to select terms based on their clusters, term distributions, and other features. Experimental results on serval TREC collections show that our proposed method significantly improves performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized web query log. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321–328. ACM, New York (2004)
Xu, Y., Jones, G.J., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: SIGIR 2009: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–66. ACM, New York (2009)
Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235–242. ACM, New York (2008)
Collins-Thompson, K., Callan, J.: Estimation and use of uncertainty in pseudo-relevance feedback. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and evelopment in Information Retrieval, pp. 303–310. ACM, New York (2007)
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 243–250. ACM, New York (2008)
Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: Proceedings of SIGIR 2006, pp. 162–169. ACM, New York (2006)
Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Information Processing & Management 43(4), 866–886 (2007)
Carpineto, C., de Mori, R., Romano, G., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001)
Buckley, C.: Automatic query expansion using SMART: TREC 3. In: Proceedings of The Third Text REtrieval Conference (TREC-3), pp. 69–80 (1994)
Yu, S., Cai, D., Wen, J.R., Ma, W.Y.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of WWW 2003, pp. 11–18. ACM, New York (2003)
Zhang, Q., Wang, B., Huang, X.H., Wu, L.: FDU at TREC 2007: opinion retrieval of blog track. In: Proceedings of The Sixteen Text REtrieval Conference, TREC-2007 (2007)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M.M., Gatford, M., Payne, A.: Okapi at TREC-4. In: Proceedings of The Fourth Text REtrieval Conference, TREC-4 (1996)
Moldovan, D.I., Mihalcea, R.: Using wordnet and lexical operators to improve internet searches. IEEE Internet Computing 4(1), 34–43 (2000)
Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of SIGIR 2006, pp. 382–389. ACM, New York (2006)
Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 704–711. ACM, New York (2005)
Sakai, T., Manabe, T., Koyama, M.: Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4(2), 111–135 (2005)
Huang, X., Croft, W.B.: A unified relevance model for opinion retrieval. In: Proceedings of 16th Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China (2009)
Udupa, R., Bhole, A., Bhattacharyya, P.: A term is known by the company it keeps: On selecting a good expansion set in pseudo relevance feedback. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 104–115. Springer, Heidelberg (2009)
Robertson, S.E.: On term selection for query expansion. Journal of Documentation 46(4), 359–364 (1990)
Buckley, C., Mitra, M., Walz, J.A., Cardie, C.: Using clustering and superconcepts within SMART: TREC 6. Inf. Process. Manage. 36(1), 109–131 (2000)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: CSB 2003: Proceedings of the IEEE Computer Society Conference on Bioinformatics, Washington, DC, USA, p. 523. IEEE Computer Society Press, Los Alamitos (2003)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, Y., Zhang, Q., Zhou, Y., Huang, X. (2010). Pseudo-Relevance Feedback Based on mRMR Criteria. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-17187-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)