Article

Merging techniques for performing data fusion on the web

Authors:
Theodora Tsikrika

Queen Mary, University of London, London, UK

Queen Mary, University of London, London, UK
View Profile

,
Mounia Lalmas

Queen Mary, University of London, London, UK

Queen Mary, University of London, London, UK
View Profile

CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementOctober 2001Pages 127–134https://doi.org/10.1145/502585.502608

Published:05 October 2001Publication History

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

Pages 127–134

ABSTRACT

Data fusion on the Web refers to the merging, into a unified single list, of the ranked document lists, which are retrieved in response to a user query by more than one Web search engine. It is performed by metasearch engines and their merging algorithms utilise the information present in the ranked lists of retrieved documents provided to them by the underlying search engines, such as the rank positions of the retrieved documents and their retrieval scores. In this paper, merging techniques are introduced that take into account not only the rank positions, but also the title and the summary accompanying the retrieved documents. Furthermore, the data fusion process is viewed as being similar to the combination of belief in uncertain reasoning and is modelled using Dempster-Shafer's theory of evidence. Our evaluation experiments indicate that the above merging techniques yield improvements in the effectiveness and that their effectiveness is comparable to that of the approach that merges the ranked lists by downloading and analysing the Web documents.

References

1.Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. Addison & Wesley, 1999. Google ScholarDigital Library
2.Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431-448, 1995. Google ScholarDigital Library
3.Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998. Google ScholarDigital Library
4.Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In the Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78-83.Google Scholar
5.Callan, J. P., Lu, Z. & Croft, W.B. Searching Distributed Collections with Inference Networks. In the Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995. Google ScholarDigital Library
6.Dreilinger, D. & Howe, A. Experiences with Selecting Search Engines Using Metasearchl. ACM TOIS, 15(3), July 1997, pp. 195-222. Google ScholarDigital Library
7.Frakes, W. B. & Baeza-Yates, R. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, NJ, USA, 1992. Google ScholarDigital Library
8.Gravano, L., Chang, K., Garcia-Molina, H., Lagoze, C. & Paepcke, A. Digital Library Project, Stanford University. STARTS - Stanford Protocol Proposal for Internet Retrieval and Search. http://www-db.stanford.edu/-uravano/starts.htmlGoogle Scholar
9.Gravano, L., Chang, K., Garcia-Molina, H. & Paepcke, A. STARTS - Stanford Protocol Proposal for Internet Meta- Searching. In the Proceedings ACM SIGMOD International Conference on Management ofData, May 13-15,1997, Tucson, Arizona, USA. Google ScholarDigital Library
10.Gravano, L. & Garcia-Molina, H. Merging Ranks from Heterogeneous Internet Sources. In the Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997. Google ScholarDigital Library
11.Gauch, S. & Wang, H. Information Fusion with ProFusion. In the Proceedings of the WebNet96: The First Conference on the Web Society, San Francisco, CA, USA, October 1996.Google Scholar
12.Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.Google Scholar
13.Hawking, D., Craswell, N. & Hannan, D. Results and Challenges in Web Search Evaluation. In the Proceedings of the Eigth International World Wide Web Conference, Toronto, Canada, 1999. Google ScholarDigital Library
14.Kirsch, S. T. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents, United States Patent #5,659,732, 1997.Google Scholar
15.Lawrence, S. & Lee Giles, C. NEC Research Institute. Inquirus - The NECI Metasearch Engine. http://www.ncci.ni.nec.com/-lawrencelinouirus.html.Google Scholar
16.Lawrence, S. & Lee Giles, C. Inquirus - The NECI Metasearch Engine. In the Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Elsevier Sience, pp. 95-105, 1998. Google ScholarDigital Library
17.Lawrence, S. & Lee Giles, C. NEC Research Institute. Searching the World Wide Web. Science, Volume 280, Number 5360, pp.98-100, 1998.Google Scholar
18.Porter, M.F. An algorithm for suffix stripping. In K. Sparck Jones and P. Willet, editors, Readings in Information Retrieval, pages 3 13-3 16. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
19.Smeaton, A. F. & Crimmins, F. Using a Data Fusion Agent for Searching the WWW. Poster presented at the Sixth International World Wide Web Conference, Stanford, USA, April 1997.Google Scholar
20.Savoy, J., Le Calve, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TRECS, 1996.NIST Publication 500-238, Gaithersburg (MD), 489-502.Google Scholar
21.Selberg, E. & Etzioni, 0. Multi-Service Search and Comparison using the MetaCrawler. In the Proceedings of the 4th International World Wide Web Conference, December 1995.Google Scholar
22.Selberg, E. & Etzioni, 0. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8-14.Google Scholar
23.Shafer, G. A mathematical theory of evidence, Princeton University Press, 1976.Google Scholar
24.Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187-222. Google ScholarDigital Library
25.Vogt, C. C. How much more is better? Characterising the effects of adding more IR systems to the combination. In the Proceedings of the Computer Assisted Information Retrieval International Conference (RIAO), Paris 2000.Google Scholar
26.Voorhees, E. M., Gupta, N. K. & Johnson-Laird, B. The collection fusion problem. In the Proceedings of the Third Text Retrieval (TREC-3) Conference, pp. 95-104, 1994.Google Scholar
27.Yager, R. R. & Rybalov, A. On the Fusion of Documents from Multiple Collection Information Retrieval Systems. Journal of the American Society for Information Science. 49(13), pp.1177-1184, 1998. Google ScholarDigital Library

Index Terms

Merging techniques for performing data fusion on the web
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking
  2. Information storage systems

Recommendations

Click data as implicit relevance feedback in web search

Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps ...
Read More
Authority and ranking effects in data fusion

This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and ...
Read More
Generative model-based metasearch for data fusion in information retrieval
JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

"Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Editors:
Henrique Paques
Georgia Institute of Technology
,
Ling Liu
Georgia Institute of Technology
,
David Grossman
Illinois Institute of Technology
,
General Chair:
Calton Pu
Georgia Institute of Technology
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Dempster-Shafer's theory of evidence
information retrieval
web data fusion
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 995
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Merging techniques for performing data fusion on the web

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Click data as implicit relevance feedback in web search

Authority and ranking effects in data fusion

Generative model-based metasearch for data fusion in information retrieval