Abstract
We propose a hybrid information retrieval (IR) procedure that builds on two well-known IR approaches: data fusion and query expansion via relevance feedback. This IR procedure is designed to exploit the strengths of data fusion and relevance feedback and to avoid some weaknesses of these approaches. We show that our IR procedure is built on postulates that can be justified analytically and empirically. Additionally, we offer an empirical investigation of the procedure, showing that it is superior to relevance feedback on some dimensions and comparable on other dimensions. The empirical investigation also verifies the conditions under which the use of our IR procedure could be beneficial.
Article PDF
Similar content being viewed by others
References
Bartell B, Cottrell G and Belew R (1994) Automatic combination of multiple ranked retrieval systems. In: Croft W and van Rijsbergen C, Eds., Proceedings of the 17th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag, New York, pp. 173–181.
Belkin N, Cool C, Croft W and Callan, J (1993) The effect of multiple query representations on information retrieval performance. In: Korfhage R, Rasmussen E and Willett P, Eds., Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, pp. 339–346.
Bookstein A and Swanson D (1974) Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25:312–318.
Bookstein A and Swanson D (1975) A decision theoretic foundation for indexing. Journal of the American Society for Information Science, 26:45–50.
Borlund P (2002) The concept of relevance in IR. Journal of the American Society for Information Science, 54(10):913–925.
Buckley C (1995) Massive Query Expansion for Relevance Feedback. Working paper, Cornell University.
Buckley C, Allan J and Salton G (1993) Automatic routing and ad-hoc retrieval using SMART: TREC 2. In: Harman D, Ed., Proceedings of the Second Text REtrieval Conference (TREC-2), NIST Special Publication, pp. 45–56.
Buckley C, Salton G and Allan J (1992) Automatic retrieval with locality information using SMART. In Harman D, Ed., Proceedings of the First Text REtrieval Conference (TREC-1). NIST Special Publication 500–207, pp. 59–72.
Buckley C, Salton G and Allan J (1994) The effect of adding relevance information in a relevance feedback environment. In: Croft Wand van Rijsbergen C, Eds., Proceedings of the 17th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag New York, pp. 292–300.
Buckley C, Walz J, Mitra Mand Cardie C (1997) Using Clustering and SuperConcepts within SMART. In: Voorhees E and Harman D, Eds., Proceedings of the Sixth Text Retrieval Conference (TREC-6), National Institute of Standards and Technology Special Publication 500–215, pp. 107.
Chen H (1994) Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. Journal of the American Society for Information Science, 46(3):194–216.
Cohen W, Schapire R and Singer Y (1999) Learning to order things. Journal of Artificial Intelligence Research, 10:243–270.
Efthimiadis E (1996) Query expansion. Annual Review of Information Systems and Technology (ARIST), 31:121–187.
Fox E and Shaw J (1994) Combination of multiple searches. In: Proceedings of the Third Text Retrieval Conference (TREC-3), National Institute of Standards and Technology Special Publication 500–215, pp. 243–252.
Kantor P (1992) Two heads are better than one: The potential of data fusion concepts for improvement of online searching. In: Williams M, Ed., The 13th National Online Meeting, Learned Information, Inc., pp. 147–151.
Katzer J, McGill M, Tessier J, Frakes W and Dasgupta P (1982) A study of the overlap among document representations. Information Technology: Research and Development, 1(2):261–274.
Lee J (1998) Combining the evidence of different relevance feedback methods for information retrieval. Information Processing & Management, 34(6):681–691.
Ng K and Kantor P (1998) An investigation of the conditions for effective data fusion in information retrieval: A pilot study. In: Proceedings of the 61th Annual Meeting of the American Society for Information Science, pp. 167–178.
Ng K and Kantor P (2000) Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of American Society for Information Science, 51(13):1177–1189.
Ng K, Loewenstern D, Basu C, Hirsh H and Kantor P (1997) Data fusion of machine learning methods for the TREC-5 routing task. In: Harman D, Ed., Proceedings of the Fifth Text Retrieval Conference. NIST Special Publication 500–238.
Ponte J and Croft W (1998) Language modeling approach to information retrieval. In: Proceedings of the ACM SIGIR, pp. 275–281.
Rocchio J, Jr. (1965) Relevance Feedback in Information Retrieval. Scientific Rpt. ISR-9, Section 23, Harvard Comp. Lab., Cambridge, MA.
Rorvig M (1999) Visual exploration of the orderliness of TREC relevance judgment. Journal of the American Society for Information Science, 50(8):652–666.
Salton G and Buckley C (1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288–297.
Salton G and Buckley C (1995) Optimization of relevance feedback weights. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, pp. 351–357.
Salton G and McGill M (1983) Introduction to Modern Information Retrieval. McGraw-Hill Book Company.
Salton G, Wong A and Yang S (1975) A vector space model for automatic indexing. Communications of the ACM, (18):613–620.
Saracevic T and Kantor P (1988) A study of information seeking and retrieving. III. Searchers, searches, and overlap. Journal of the American Society for Information Science, 39(3):197–216.
Schamber L (1994) Relevance and information behavior. In: Williams M, Ed., Annual Review of Information Science and Technology (ARIST), (29):3–46
Turtle H and Croft W (1991) Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222.
van Rijsbergen C (1979) Information Retrieval, 2nd edition. Butterworths, London.
Willett P (1988) Recent trends in hierarchical document clustering: A critical review. Information Processing & Management, 24(5):577–597.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Xu, Y., Benaroch, M. Information Retrieval with a Hybrid Automatic Query Expansion and Data Fusion Procedure. Information Retrieval 8, 41–65 (2005). https://doi.org/10.1023/B:INRT.0000048496.31867.62
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000048496.31867.62