ABSTRACT
Given a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon receiving a user query, the metasearch engine sends it to those underlying search engines which are likely to return the desired documents for the query. The selection algorithm used by a metasearch engine to determine whether a search engine should be sent the query typically makes the decision based on the search-engine representative, which contains characteristic information about the database of a search engine. However, an underlying search engine may not be willing to provide the needed information to the metasearch engine. This paper shows that the needed information can be estimated from an uncooperative search engine with good accuracy. Two pieces of information which permit accurate search engine selection are the number of documents indexed by the search engine and the maximum weight of each term. In this paper, we present techniques for the estimation of these two pieces of information.
- J. Callan, M. Connell, and A. Du. Automatic discovery of language models for text databases. In Proceedings of ACM SIGMOD, pages 479--490, 1999. Google ScholarDigital Library
- K. Liu, C. Yu, W. Meng, W. Wu, and N. Rishe. A statistical method for estimating the usefulness of text databases. IEEE Transactions on Knowledge and Data Engineering. (to appear). Google ScholarDigital Library
- W. Meng, K. Liu, C. Yu, X. Wang, Y. Chang, and N. Rishe. Determining text databases to search in the internet. In VLDB, 1998. Google ScholarDigital Library
- W. Meng, K. Liu, C. Yu, W. Wu, and N. Rishe. Estimating the usefulness of search engines. In ICDE, March 1999.Google Scholar
- W. Meng, C. Yu, and K. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys, 34(1):48--89, March 2002. Google ScholarDigital Library
- S. Robertson, S. Walker, and M. Beaulieu. Okapi at trec-7: automatic ad hoc, filtering, vlc and interactive. In Overview of the Seventh Text Retrieval Conference, 1998.Google Scholar
- G. Salton and M. McGill. Introduction to Modern Information Retrieval. McCraw-Hill, New York, 1983. Google ScholarDigital Library
- S. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarDigital Library
- C. Yu, K. Liu, W. Wu, W. Meng, and N. Rishe. Finding the most similar documents across multiple text databases. In Proceedings of the IEEE Conference on Advances in Digital Libraries (ADL'99), Baltimore, Maryland, May 1999. Google ScholarDigital Library
- C. Yu, W. Meng, K. Liu, W. Wu, and N. Rishe. Efficient and effective metasearch for a large number of text databases. In Proceedings of ACM CIKM, November 1999. Google ScholarDigital Library
- C. Yu, W. Meng, W. Wu, and K. Liu. Efficient and effective metasearch for text databases incorporating linkages among documents. In Proceedings of ACM SIGMOD, pages 187--198, 2001. Google ScholarDigital Library
Index Terms
- Discovering the representative of a search engine
Recommendations
Discovering the representative of a search engine
CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementGiven a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon ...
How to Improve Your Search Engine Ranking: Myths and Reality
Search engines have greatly influenced the way people access information on the Internet, as such engines provide the preferred entry point to billions of pages on the Web. Therefore, highly ranked Web pages generally have higher visibility to people ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Comments