skip to main content
10.1145/1321440.1321487acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Hybrid results merging

Published: 06 November 2007 Publication History

Abstract

The problem of results merging in distributed information retrieval environments has been approached by two different directions in research. Estimation approaches attempt to calculate the relevance of the returned documents through ad-hoc methodologies (weighted score merging, regression etc) while download approaches, download all the documents locally, partially or completely, in order to estimate "first hand" their relevance. Both have their advantages and disadvantages. It is assumed that download algorithms are more effective but they are very expensive in terms of time and bandwidth. Estimation approaches on the other hand, usually rely on document relevance scores being returned by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging, rely on a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced reconciles the above two approaches, combining their strengths, while minimizing their weaknesses. It is based on downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. The proposed algorithm is tested in a variety of settings and its performance is found to be better than estimation approaches, while approximating that of download.

References

[1]
The lemur toolkit. www.lemurproject.org.
[2]
T. Avrahami, L. Yau, S. Luo, and J. Callan. The fedlemur project: Federated search in the real world. J. Am. Soc. Inf. Sci. Technol., 57(3):347--358, 2006.
[3]
P. Bailey, N. Craswell, and D. Hawking. Engineering a multi-purpose test collection for web retrieval experiments. Inf. Process. Manage., 39(6):853--871, 2003.
[4]
M. Bergman. The deep web: Surfacing hidden value. http://www.brightplanet.com/images/stories/pdf/deepwebwhitepaper.
[5]
J. Callan. Distributed information retrieval. 2000.
[6]
J. Callan and M. Connell. Query-based sampling of text databases. ACM Trans. Inf. Syst., 19(2):97--130, 2001.
[7]
J. Callan, W. Croft, and S. Harding. Inquery retrieval system.
[8]
J. Callan, L. Zhihong, and W. Croft. Searching distributed collections with inference networks. In SIGIR '95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--28. ACM Press, 1995.
[9]
Calve, A. L., and J. Savoy. Database merging strategy based on logistic regression. Inf. Process. Manage., 36(3):341--359, 2000.
[10]
N. Craswell, D. Hawking, and P. Thistlewaite. Merging results from isolated search engines. In Australasian Database Conference, pages 189--200, 1999.
[11]
L. Gravano, K. Chang, H. Garcia-Molina, and A. Paepcke. Starts: Stanford protocol proposal for internet retrieval and search. Technical report, Stanford University, 1997.
[12]
B. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207--227, 2000.
[13]
M. Javed A. Aslam. Models for metasearch. Technical report, 2001.
[14]
J. Lee. Analyses of multiple evidence combination. Technical report, 1997.
[15]
S. Luo and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 298--305. ACM Press, 2003.
[16]
S. Luo and J. Callan. A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst., 21(4):457--491, 2003.
[17]
H. Nottelmann and N. Fuhr. From uncertain inference to probability of relevance for advanced IR applications. pages 235--250, 2003.
[18]
G. Paltoglou, M. Salampasis, and M. Satratzemi. Results merging algorithm using multiple regression models. In Proceedings of the 29th European Conference on Information Retrieval (ECIR), Rome, Italy, 2007. LNCS, Springer, 2007.
[19]
A. Powell, J. French, J. Callan, M. Connell, and C. Viles. The impact of database selection on distributed searching. In SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--239. ACM Press, 2000.
[20]
S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In Proceedings of the Twenty-seventh International Conference on Very Large Databases, 2001.
[21]
S. Robertson, S. Walker, H.-B. M., and G. M. Okapi at trec-3. In TREC-3, 1994.
[22]
E. Voorhees, N. Gupta, and B. Johnson-Laird. The collection fusion problem. In D. Harman, editor, The Third Text REtrieval Conference (TREC-3), pages 500--725. National Institute of Standards and Technology, 1994.
[23]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM Press, 2001.

Cited By

View all
  • (2022)Language-Preference-Based Re-ranking for Multilingual Swahili Information RetrievalProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545131(144-152)Online publication date: 23-Aug-2022
  • (2017)Evaluation of Search Engine Weight by Considering Repeated Web Page ContentsIntelligent Automation & Soft Computing10.1080/10798587.2017.131608323:4(589-597)Online publication date: 11-May-2017
  • (2016)Improving multiple search engines retrieval results using fusion2016 Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CT-IETA)10.1109/CT-IETA.2016.7868241(1-7)Online publication date: Oct-2016
  • Show More Cited By

Index Terms

  1. Hybrid results merging

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed information retrieval
    2. results merging

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Language-Preference-Based Re-ranking for Multilingual Swahili Information RetrievalProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545131(144-152)Online publication date: 23-Aug-2022
    • (2017)Evaluation of Search Engine Weight by Considering Repeated Web Page ContentsIntelligent Automation & Soft Computing10.1080/10798587.2017.131608323:4(589-597)Online publication date: 11-May-2017
    • (2016)Improving multiple search engines retrieval results using fusion2016 Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CT-IETA)10.1109/CT-IETA.2016.7868241(1-7)Online publication date: Oct-2016
    • (2015)Improving Results Aggregation Strategies in Distributed Information RetrievalInternational Journal of Engineering Research in Africa10.4028/www.scientific.net/JERA.17.9417(94-104)Online publication date: Jul-2015
    • (2013) Seeking beyond with IntegraL : A user study of sense‐making enabled by anchor‐based virtual integration of library systems Journal of the American Society for Information Science and Technology10.1002/asi.2290464:9(1927-1945)Online publication date: 22-Jul-2013
    • (2010)Research proposal for distributed deep web searchProceedings of the 3rd workshop on Ph.D. students in information and knowledge management10.1145/1871902.1871909(33-38)Online publication date: 30-Oct-2010
    • (2010)Collection-integral source selection for uncooperative distributed information retrieval environmentsInformation Sciences: an International Journal10.1016/j.ins.2010.03.020180:14(2763-2776)Online publication date: 1-Jul-2010
    • (2009)AnchorWomanProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646317(2089-2090)Online publication date: 2-Nov-2009
    • (2008)Integral based source selection for uncooperative distributed information retrieval environmentsProceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval10.1145/1458469.1458475(67-74)Online publication date: 30-Oct-2008
    • (2008)A Comparison of Centralized and Distributed Information Retrieval ApproachesProceedings of the 2008 Panhellenic Conference on Informatics10.1109/PCI.2008.18(21-25)Online publication date: 28-Aug-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media