ABSTRACT
Data fusion on the Web refers to the merging, into a unified single list, of the ranked document lists, which are retrieved in response to a user query by more than one Web search engine. It is performed by metasearch engines and their merging algorithms utilise the information present in the ranked lists of retrieved documents provided to them by the underlying search engines, such as the rank positions of the retrieved documents and their retrieval scores. In this paper, merging techniques are introduced that take into account not only the rank positions, but also the title and the summary accompanying the retrieved documents. Furthermore, the data fusion process is viewed as being similar to the combination of belief in uncertain reasoning and is modelled using Dempster-Shafer's theory of evidence. Our evaluation experiments indicate that the above merging techniques yield improvements in the effectiveness and that their effectiveness is comparable to that of the approach that merges the ranked lists by downloading and analysing the Web documents.
- 1.Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. Addison & Wesley, 1999. Google ScholarDigital Library
- 2.Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431-448, 1995. Google ScholarDigital Library
- 3.Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998. Google ScholarDigital Library
- 4.Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In the Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78-83.Google Scholar
- 5.Callan, J. P., Lu, Z. & Croft, W.B. Searching Distributed Collections with Inference Networks. In the Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995. Google ScholarDigital Library
- 6.Dreilinger, D. & Howe, A. Experiences with Selecting Search Engines Using Metasearchl. ACM TOIS, 15(3), July 1997, pp. 195-222. Google ScholarDigital Library
- 7.Frakes, W. B. & Baeza-Yates, R. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, NJ, USA, 1992. Google ScholarDigital Library
- 8.Gravano, L., Chang, K., Garcia-Molina, H., Lagoze, C. & Paepcke, A. Digital Library Project, Stanford University. STARTS - Stanford Protocol Proposal for Internet Retrieval and Search. http://www-db.stanford.edu/-uravano/starts.htmlGoogle Scholar
- 9.Gravano, L., Chang, K., Garcia-Molina, H. & Paepcke, A. STARTS - Stanford Protocol Proposal for Internet Meta- Searching. In the Proceedings ACM SIGMOD International Conference on Management ofData, May 13-15,1997, Tucson, Arizona, USA. Google ScholarDigital Library
- 10.Gravano, L. & Garcia-Molina, H. Merging Ranks from Heterogeneous Internet Sources. In the Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997. Google ScholarDigital Library
- 11.Gauch, S. & Wang, H. Information Fusion with ProFusion. In the Proceedings of the WebNet96: The First Conference on the Web Society, San Francisco, CA, USA, October 1996.Google Scholar
- 12.Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.Google Scholar
- 13.Hawking, D., Craswell, N. & Hannan, D. Results and Challenges in Web Search Evaluation. In the Proceedings of the Eigth International World Wide Web Conference, Toronto, Canada, 1999. Google ScholarDigital Library
- 14.Kirsch, S. T. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents, United States Patent #5,659,732, 1997.Google Scholar
- 15.Lawrence, S. & Lee Giles, C. NEC Research Institute. Inquirus - The NECI Metasearch Engine. http://www.ncci.ni.nec.com/-lawrencelinouirus.html.Google Scholar
- 16.Lawrence, S. & Lee Giles, C. Inquirus - The NECI Metasearch Engine. In the Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Elsevier Sience, pp. 95-105, 1998. Google ScholarDigital Library
- 17.Lawrence, S. & Lee Giles, C. NEC Research Institute. Searching the World Wide Web. Science, Volume 280, Number 5360, pp.98-100, 1998.Google Scholar
- 18.Porter, M.F. An algorithm for suffix stripping. In K. Sparck Jones and P. Willet, editors, Readings in Information Retrieval, pages 3 13-3 16. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
- 19.Smeaton, A. F. & Crimmins, F. Using a Data Fusion Agent for Searching the WWW. Poster presented at the Sixth International World Wide Web Conference, Stanford, USA, April 1997.Google Scholar
- 20.Savoy, J., Le Calve, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TRECS, 1996.NIST Publication 500-238, Gaithersburg (MD), 489-502.Google Scholar
- 21.Selberg, E. & Etzioni, 0. Multi-Service Search and Comparison using the MetaCrawler. In the Proceedings of the 4th International World Wide Web Conference, December 1995.Google Scholar
- 22.Selberg, E. & Etzioni, 0. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8-14.Google Scholar
- 23.Shafer, G. A mathematical theory of evidence, Princeton University Press, 1976.Google Scholar
- 24.Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187-222. Google ScholarDigital Library
- 25.Vogt, C. C. How much more is better? Characterising the effects of adding more IR systems to the combination. In the Proceedings of the Computer Assisted Information Retrieval International Conference (RIAO), Paris 2000.Google Scholar
- 26.Voorhees, E. M., Gupta, N. K. & Johnson-Laird, B. The collection fusion problem. In the Proceedings of the Third Text Retrieval (TREC-3) Conference, pp. 95-104, 1994.Google Scholar
- 27.Yager, R. R. & Rybalov, A. On the Fusion of Documents from Multiple Collection Information Retrieval Systems. Journal of the American Society for Information Science. 49(13), pp.1177-1184, 1998. Google ScholarDigital Library
Index Terms
- Merging techniques for performing data fusion on the web
Recommendations
Click data as implicit relevance feedback in web search
Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps ...
Authority and ranking effects in data fusion
This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and ...
Generative model-based metasearch for data fusion in information retrieval
JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries"Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital ...
Comments