skip to main content
10.1145/502585.502608acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Merging techniques for performing data fusion on the web

Published:05 October 2001Publication History

ABSTRACT

Data fusion on the Web refers to the merging, into a unified single list, of the ranked document lists, which are retrieved in response to a user query by more than one Web search engine. It is performed by metasearch engines and their merging algorithms utilise the information present in the ranked lists of retrieved documents provided to them by the underlying search engines, such as the rank positions of the retrieved documents and their retrieval scores. In this paper, merging techniques are introduced that take into account not only the rank positions, but also the title and the summary accompanying the retrieved documents. Furthermore, the data fusion process is viewed as being similar to the combination of belief in uncertain reasoning and is modelled using Dempster-Shafer's theory of evidence. Our evaluation experiments indicate that the above merging techniques yield improvements in the effectiveness and that their effectiveness is comparable to that of the approach that merges the ranked lists by downloading and analysing the Web documents.

References

  1. 1.Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. Addison & Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431-448, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In the Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78-83.Google ScholarGoogle Scholar
  5. 5.Callan, J. P., Lu, Z. & Croft, W.B. Searching Distributed Collections with Inference Networks. In the Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Dreilinger, D. & Howe, A. Experiences with Selecting Search Engines Using Metasearchl. ACM TOIS, 15(3), July 1997, pp. 195-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Frakes, W. B. & Baeza-Yates, R. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, NJ, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Gravano, L., Chang, K., Garcia-Molina, H., Lagoze, C. & Paepcke, A. Digital Library Project, Stanford University. STARTS - Stanford Protocol Proposal for Internet Retrieval and Search. http://www-db.stanford.edu/-uravano/starts.htmlGoogle ScholarGoogle Scholar
  9. 9.Gravano, L., Chang, K., Garcia-Molina, H. & Paepcke, A. STARTS - Stanford Protocol Proposal for Internet Meta- Searching. In the Proceedings ACM SIGMOD International Conference on Management ofData, May 13-15,1997, Tucson, Arizona, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Gravano, L. & Garcia-Molina, H. Merging Ranks from Heterogeneous Internet Sources. In the Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Gauch, S. & Wang, H. Information Fusion with ProFusion. In the Proceedings of the WebNet96: The First Conference on the Web Society, San Francisco, CA, USA, October 1996.Google ScholarGoogle Scholar
  12. 12.Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.Google ScholarGoogle Scholar
  13. 13.Hawking, D., Craswell, N. & Hannan, D. Results and Challenges in Web Search Evaluation. In the Proceedings of the Eigth International World Wide Web Conference, Toronto, Canada, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Kirsch, S. T. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents, United States Patent #5,659,732, 1997.Google ScholarGoogle Scholar
  15. 15.Lawrence, S. & Lee Giles, C. NEC Research Institute. Inquirus - The NECI Metasearch Engine. http://www.ncci.ni.nec.com/-lawrencelinouirus.html.Google ScholarGoogle Scholar
  16. 16.Lawrence, S. & Lee Giles, C. Inquirus - The NECI Metasearch Engine. In the Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Elsevier Sience, pp. 95-105, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Lawrence, S. & Lee Giles, C. NEC Research Institute. Searching the World Wide Web. Science, Volume 280, Number 5360, pp.98-100, 1998.Google ScholarGoogle Scholar
  18. 18.Porter, M.F. An algorithm for suffix stripping. In K. Sparck Jones and P. Willet, editors, Readings in Information Retrieval, pages 3 13-3 16. Morgan Kaufmann Publishers Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Smeaton, A. F. & Crimmins, F. Using a Data Fusion Agent for Searching the WWW. Poster presented at the Sixth International World Wide Web Conference, Stanford, USA, April 1997.Google ScholarGoogle Scholar
  20. 20.Savoy, J., Le Calve, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TRECS, 1996.NIST Publication 500-238, Gaithersburg (MD), 489-502.Google ScholarGoogle Scholar
  21. 21.Selberg, E. & Etzioni, 0. Multi-Service Search and Comparison using the MetaCrawler. In the Proceedings of the 4th International World Wide Web Conference, December 1995.Google ScholarGoogle Scholar
  22. 22.Selberg, E. & Etzioni, 0. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8-14.Google ScholarGoogle Scholar
  23. 23.Shafer, G. A mathematical theory of evidence, Princeton University Press, 1976.Google ScholarGoogle Scholar
  24. 24.Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.Vogt, C. C. How much more is better? Characterising the effects of adding more IR systems to the combination. In the Proceedings of the Computer Assisted Information Retrieval International Conference (RIAO), Paris 2000.Google ScholarGoogle Scholar
  26. 26.Voorhees, E. M., Gupta, N. K. & Johnson-Laird, B. The collection fusion problem. In the Proceedings of the Third Text Retrieval (TREC-3) Conference, pp. 95-104, 1994.Google ScholarGoogle Scholar
  27. 27.Yager, R. R. & Rybalov, A. On the Fusion of Documents from Multiple Collection Information Retrieval Systems. Journal of the American Society for Information Science. 49(13), pp.1177-1184, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Merging techniques for performing data fusion on the web

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
          October 2001
          616 pages
          ISBN:1581134363
          DOI:10.1145/502585

          Copyright © 2001 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 October 2001

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader