ABSTRACT
AllInOneNews is the largest news metasearch engine in the world, connecting to over 1,000 news sites over 150 countries. Implementing a large-scale metasearch engine like AllInOneNews needs to overcome unique challenges not faced by building small metasearch engines such as developing highly scalable search engine selection techniques. In this paper, we discuss these unique challenges and our solutions to these challenges. We also discuss some novel features of AllInOneNews such as highly automated solution and semantic query match. This paper also reports the results of a comparative evaluation of three commercial news search systems, one search engine - Google News and two metasearch engines - Mamma News and AllInOneNews. Several measures such as effectiveness, diversity and time-sensitivity are used to perform the comparison. Another contribution of this paper is that we introduce a novel scheme to compare multiple news search systems in a combined measure that takes both relevance and time-sensitivity of retrieved information into consideration.
- C. Baumgarten. A probabilistic solutions to the selection and fusion problem in distributed information retrieval. ACM SIGIR Conference, 1999. Google ScholarDigital Library
- M. Bergman. The Deep Web: Surfacing Hidden Value. White Paper of CompletePlanet at http://brightplanet.com/pdf/deepwebwhitepaper.pdf, 2001.Google Scholar
- L. Barbosa, J. Freire. Searching for hidden-web databases. 8th International Workshop on WebDB, 2005.Google Scholar
- J. Callan, Z. Lu, and. W. Croft. Searching Distributed Collections with Inference Networks. ACM SIGIR, 1995, pp.21--28. Google ScholarDigital Library
- J. Cope, N. Craswell, D. Hawking. Automated Discovery of Search Interfaces on the Web. ADC 2003: 181--189. Google ScholarDigital Library
- D. Dreilinger, and A. Howe. Experiences with selecting search engines using metasearch. ACM Transactions on Information Systems, July, 1997, pp.195--222. Google ScholarDigital Library
- Y. Fan, and S. Gauch. Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources. 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University, March 1999.Google Scholar
- S. Gauch, G. Wang, and M. Gomez. ProFusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 1996.Google Scholar
- L. Gravano, and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. VLDB, 1995, pp.78--89. Google ScholarDigital Library
- L. Gravano, and H. Garcia-Molina. Merging ranks from heterogeneous Internet sources. VLDB, 1997, pp.196--205. Google ScholarDigital Library
- D. Hawking, N. Craswell, and K. Griffiths. Which search engine is best at finding online services? WWW conference, poster, 2001.Google Scholar
- D. Hawking, N. Craswell, P. Bailey, K. Griffiths. Measuring Search Engine Quality. Information Retrieval, 4(1), 2001. Google ScholarDigital Library
- K. L. Liu, C. Yu, W. Meng, W. Wu, and N. Rishe. A Statistical Method for Estimating the Usefulness of Text Databases. IEEE TKDE, 2002. Google ScholarDigital Library
- Y. Lu, W. Meng, L. Shu, C. Yu, and K. L. Liu. Evaluation of Result Merging Strategies for Metasearch Engines. WISE Conference, pp.53--66, November 2005. Google ScholarDigital Library
- Y. Lu, W. Meng, W. Zhang, K. L. Liu, and C. Yu. Automatic Extraction of Publication Time from News Search Results. Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI2006), April 2006.Google Scholar
- U. Manber, and P. Bigot. The Search Broker. USENIX Symposium and Internet Techniques and Systems, Monterey, California, December, 1997, pp.231--239. Google ScholarDigital Library
- W. Meng, K. L. Liu, C. Yu, X. Wang, Y. Chang and N. Rishe. Determining Text Databases to Search in the Internet. VLDB, 1998. Google ScholarDigital Library
- W. Meng, Z. Wu, C. Yu, and Z. Li. A Highly-Scalable and Effective Method for Metasearch. ACM Transactions on Information Systems 19(3), pp.310--335, July 2001. Google ScholarDigital Library
- W. Meng, C. Yu, and K. L. Liu. Building Efficient and Effective Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, pp.48--84. Google ScholarDigital Library
- Y. Rasolofo, D. Hawking, and J. Savoy. Result merging strategies for a current news metasearcher. Information Processing & Management, 39, 2003, pp.581--609. Google ScholarDigital Library
- Z. Wu, W. Meng, C. Yu, and Z. Li. Towards a highly scalable and effective metasearch engine. WWW Conference, Hong Kong, 2001. Google ScholarDigital Library
- Z. Wu, V. Raghavan, H. Qian, V. Rama K, W. Meng, H. He, and C. Yu. Towards Automatic Incorporation of Search Engines into a large-scale Metasearch Engine. IEEE/WIC International Conference on Web Intelligence, 2003. Google ScholarDigital Library
- C. Yu, W. Meng, K.L. Liu, W. Wu and N. Rishe. Efficient and Effective Metasearch for a Large Number of Text Databases. ACM CIKM, November 1999. Google ScholarDigital Library
- C. Yu, K. Liu, W. Meng, Z. Wu, and N. Rishe. A Methodology to Retrieve Text Documents from Multiple Databases. IEEE TKDE, Vol.14, No.6, November/December 2002, pp.1347--1361. Google ScholarDigital Library
- C. Yu, and W. Meng. Web Search Technology. In The Internet Encyclopedia edited by Hossein Bidgoli, Wiley Publishers, pp.738--753, 2003.Google Scholar
- B. Yuwono, and D. Lee. Server Ranking for Distributed Text Resource Systems on the Internet. DASFAA, 1997, pp.391--400. Google ScholarDigital Library
- H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu. Fully Automatic Wrapper Generation for Search Engines. WWW Conference, pp.66--75, 2005. Google ScholarDigital Library
Index Terms
- AllInOneNews: development and evaluation of a large-scale news metasearch engine
Recommendations
Discovering the representative of a search engine
CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementGiven a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon ...
Discovering the representative of a search engine
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementGiven a large number of search engines on the Internet, it is difficult for a person to determine which search engines could serve his/her information needs. A common solution is to construct a metasearch engine on top of the search engines. Upon ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Comments