skip to main content
article

Flexible pseudo-relevance feedback via selective sampling

Published: 01 June 2005 Publication History

Abstract

Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

References

[1]
Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003).
[2]
Billerbeck, B. and Zobel, J. 2003. When query expansion fails. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada. 387--388.
[3]
Buckley, C. 2004. Topic prediction based on comparative retrieval rankings. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 506--507.
[4]
Buckley, C. and Harman, D. 2004. Reliable information access final workshop report.
[5]
Buckley, C., Mitra, M., Walz, J., and Cardie, C. 1998. Using clustering and superconcepts within SMART: TREC 6. In NIST Special Publication 500-240: The 6th Text REtrieval Conference (TREC-6).
[6]
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 299--306.
[7]
Evans, D. A. and Lefferts, R. G. 1994. Design and evaluation of the CLARIT-TREC-2 system. In NIST Special Publication 500-215: The 2nd Text REtrieval Conference (TREC-2).
[8]
Harman, D. and Buckley, C. 2004. The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 528--529.
[9]
Kishida, K., Chen, K.-H., Lee, S., Kuriyama, K., Kando, N., Chen, H.-H., Myaeng, S.-H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In Working Notes of the Fourth NTCIR Workshop Meeting (NTCIR-4), Tokyo, Japan. 1--59.
[10]
Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 1--9.
[11]
Lu, A., Ayoub, M., and Dong, J. 1997. Ad hoc experiments using EUREKA. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5).
[12]
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. 206--214.
[13]
Montgomery, J., Si, L., Callan, J., and Evans, D. A. 2004. Effect of varying number of documents in blind feedback. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 476--477.
[14]
Sakai, T. 2000. MT-based Japanese-English cross-language IR experiments using the TREC test collections. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 181--188.
[15]
Sakai, T. 2001. Japanese-English cross-language information retrieval using machine translation and pseudo-relevance feedback. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14, 2, 83--107.
[16]
Sakai, T. 2004a. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.
[17]
Sakai, T. 2004b. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the 1st Asia Information Retrieval Symposium (AIRS 2004), Beijing, China. 170--177.
[18]
Sakai, T. 2005. Ranking the NTCIR systems based on multigrade relevance. In Lecture Notes in Computer Science 3411: Information Retrieval Technology (AIRS 2004 Revised Selected Papers). Springer-Verlag, New York. 251--262.
[19]
Sakai, T., Jones, G. J. F., Kajiura, M., and Sumita, K. 1999. Query expansion through feedback in Japanese information filtering based on the probabilistic model (in Japanese). Journal of Information Processing Society of Japan 40, 5, 2429--2438.
[20]
Sakai, T., Kajiura, M., and Sumita, K. 2000. A first step towards flexible local feedback for ad hoc retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 95--102.
[21]
Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004a. Toshiba BRIDJE at NTCIR-4 CLIR: monolingual/bilingual IR and flexible feedback. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.
[22]
Sakai, T., Koyama, M., Suzuki, M., Kumano, A., and Manabe, T. 2003a. BRIDJE over a language barrier: cross-language information access by integrating translation and retrieval. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Sapporo, Japan. 65--76.
[23]
Sakai, T., Koyama, M., Suzuki, M., and Manabe, T. 2003b. Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (NTCIR-3), Tokyo, Japan.
[24]
Sakai, T. and Robertson, S. E. 2001. Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 396--397.
[25]
Sakai, T. and Robertson, S. E. 2002. Relative and absolute term selection criteria: a comparative study for English and Japanese IR. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 411--412.
[26]
Sakai, T., Robertson, S. E., and Walker, S. 2001a. Flexible pseudo-relevance feedback for NTCIR-2. In Proceedings of the Second NTCIR Workshop on Research in Chinese and Japanese Text Retrieval and Text Summarization, Tokyo, Japan. 165--172.
[27]
Sakai, T., Robertson, S. E., and Walker, S. 2001b. Flexible pseudo-relevance feedback via direct mapping and categorization of search requests. In BCS-IRSG European Annual Colloquium on Information Retrieval Research (ECIR 2001), Darmstadt, Germany. 3--14.
[28]
Sakai, T., Saito, Y., Koyama, M., Kokubu, T., and Manabe, T. 2004. High-precision search via question abstraction for Japanese question answering. In Information Processing Society of Japan SIG Technical Reports FI-76-19/NL-163-19. 139--146.
[29]
Sakai, T. and Sparck Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 190--198.
[30]
Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36, 779--808 (Part I) and 809--840 (Part II).
[31]
Voorhees, E. M. 2004a. Measuring ineffectiveness. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 562--563.
[32]
Voorhees, E. M. 2004b. Overview of the TREC 2003 robust retrieval track. In NIST Special Publication 500-255: The Twelfth Text Retrieval Conference (TREC 2003).
[33]
Warren, R. H. and Liu, T. 2004. A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 570--571.
[34]
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1, 79--112.

Cited By

View all
  • (2023)A discriminative method for global query expansion and term reweighting using co-occurrence graphsJournal of Information Science10.1177/016555152199804749:1(183-206)Online publication date: 1-Feb-2023
  • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
  • (2021)Learning to Rerank Schema MatchesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.296212433:8(3104-3116)Online publication date: 1-Aug-2021
  • Show More Cited By

Index Terms

  1. Flexible pseudo-relevance feedback via selective sampling

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian Language Information Processing
    ACM Transactions on Asian Language Information Processing  Volume 4, Issue 2
    June 2005
    179 pages
    ISSN:1530-0226
    EISSN:1558-3430
    DOI:10.1145/1105696
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2005
    Published in TALIP Volume 4, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Pseudo-relevance feedback
    2. flexible pseudo-relevance feedback
    3. selective sampling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A discriminative method for global query expansion and term reweighting using co-occurrence graphsJournal of Information Science10.1177/016555152199804749:1(183-206)Online publication date: 1-Feb-2023
    • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
    • (2021)Learning to Rerank Schema MatchesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.296212433:8(3104-3116)Online publication date: 1-Aug-2021
    • (2021)Information Retrieval based Improvising Search using Automatic Query Expansion2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV)10.1109/ICICV50876.2021.9388573(1226-1230)Online publication date: 4-Feb-2021
    • (2021)Pseudo relevance feedback optimizationInformation Retrieval10.1007/s10791-021-09393-524:4-5(269-297)Online publication date: 1-Oct-2021
    • (2021)Term position‐based language model for information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2443172:5(627-642)Online publication date: 10-Apr-2021
    • (2020)A Pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrievalInformation Processing & Management10.1016/j.ipm.2020.10234257:6(102342)Online publication date: Nov-2020
    • (2019)Adaptive Local Low-rank Matrix Approximation for RecommendationACM Transactions on Information Systems10.1145/336048837:4(1-34)Online publication date: 16-Oct-2019
    • (2019)Relevance FeedbackACM Transactions on Information Systems10.1145/336048737:4(1-28)Online publication date: 4-Oct-2019
    • (2019)Cluster-Based Focused RetrievalProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358087(2305-2308)Online publication date: 3-Nov-2019
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media