skip to main content
10.1145/1247480.1247601acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

AllInOneNews: development and evaluation of a large-scale news metasearch engine

Published:11 June 2007Publication History

ABSTRACT

AllInOneNews is the largest news metasearch engine in the world, connecting to over 1,000 news sites over 150 countries. Implementing a large-scale metasearch engine like AllInOneNews needs to overcome unique challenges not faced by building small metasearch engines such as developing highly scalable search engine selection techniques. In this paper, we discuss these unique challenges and our solutions to these challenges. We also discuss some novel features of AllInOneNews such as highly automated solution and semantic query match. This paper also reports the results of a comparative evaluation of three commercial news search systems, one search engine - Google News and two metasearch engines - Mamma News and AllInOneNews. Several measures such as effectiveness, diversity and time-sensitivity are used to perform the comparison. Another contribution of this paper is that we introduce a novel scheme to compare multiple news search systems in a combined measure that takes both relevance and time-sensitivity of retrieved information into consideration.

References

  1. C. Baumgarten. A probabilistic solutions to the selection and fusion problem in distributed information retrieval. ACM SIGIR Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bergman. The Deep Web: Surfacing Hidden Value. White Paper of CompletePlanet at http://brightplanet.com/pdf/deepwebwhitepaper.pdf, 2001.Google ScholarGoogle Scholar
  3. L. Barbosa, J. Freire. Searching for hidden-web databases. 8th International Workshop on WebDB, 2005.Google ScholarGoogle Scholar
  4. J. Callan, Z. Lu, and. W. Croft. Searching Distributed Collections with Inference Networks. ACM SIGIR, 1995, pp.21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cope, N. Craswell, D. Hawking. Automated Discovery of Search Interfaces on the Web. ADC 2003: 181--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Dreilinger, and A. Howe. Experiences with selecting search engines using metasearch. ACM Transactions on Information Systems, July, 1997, pp.195--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Fan, and S. Gauch. Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources. 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University, March 1999.Google ScholarGoogle Scholar
  8. S. Gauch, G. Wang, and M. Gomez. ProFusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 1996.Google ScholarGoogle Scholar
  9. L. Gravano, and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. VLDB, 1995, pp.78--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Gravano, and H. Garcia-Molina. Merging ranks from heterogeneous Internet sources. VLDB, 1997, pp.196--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Hawking, N. Craswell, and K. Griffiths. Which search engine is best at finding online services? WWW conference, poster, 2001.Google ScholarGoogle Scholar
  12. D. Hawking, N. Craswell, P. Bailey, K. Griffiths. Measuring Search Engine Quality. Information Retrieval, 4(1), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. L. Liu, C. Yu, W. Meng, W. Wu, and N. Rishe. A Statistical Method for Estimating the Usefulness of Text Databases. IEEE TKDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Lu, W. Meng, L. Shu, C. Yu, and K. L. Liu. Evaluation of Result Merging Strategies for Metasearch Engines. WISE Conference, pp.53--66, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Lu, W. Meng, W. Zhang, K. L. Liu, and C. Yu. Automatic Extraction of Publication Time from News Search Results. Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI2006), April 2006.Google ScholarGoogle Scholar
  16. U. Manber, and P. Bigot. The Search Broker. USENIX Symposium and Internet Techniques and Systems, Monterey, California, December, 1997, pp.231--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Meng, K. L. Liu, C. Yu, X. Wang, Y. Chang and N. Rishe. Determining Text Databases to Search in the Internet. VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Meng, Z. Wu, C. Yu, and Z. Li. A Highly-Scalable and Effective Method for Metasearch. ACM Transactions on Information Systems 19(3), pp.310--335, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Meng, C. Yu, and K. L. Liu. Building Efficient and Effective Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, pp.48--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Rasolofo, D. Hawking, and J. Savoy. Result merging strategies for a current news metasearcher. Information Processing & Management, 39, 2003, pp.581--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Wu, W. Meng, C. Yu, and Z. Li. Towards a highly scalable and effective metasearch engine. WWW Conference, Hong Kong, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Wu, V. Raghavan, H. Qian, V. Rama K, W. Meng, H. He, and C. Yu. Towards Automatic Incorporation of Search Engines into a large-scale Metasearch Engine. IEEE/WIC International Conference on Web Intelligence, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Yu, W. Meng, K.L. Liu, W. Wu and N. Rishe. Efficient and Effective Metasearch for a Large Number of Text Databases. ACM CIKM, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Yu, K. Liu, W. Meng, Z. Wu, and N. Rishe. A Methodology to Retrieve Text Documents from Multiple Databases. IEEE TKDE, Vol.14, No.6, November/December 2002, pp.1347--1361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Yu, and W. Meng. Web Search Technology. In The Internet Encyclopedia edited by Hossein Bidgoli, Wiley Publishers, pp.738--753, 2003.Google ScholarGoogle Scholar
  26. B. Yuwono, and D. Lee. Server Ranking for Distributed Text Resource Systems on the Internet. DASFAA, 1997, pp.391--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu. Fully Automatic Wrapper Generation for Search Engines. WWW Conference, pp.66--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AllInOneNews: development and evaluation of a large-scale news metasearch engine

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
                June 2007
                1210 pages
                ISBN:9781595936868
                DOI:10.1145/1247480
                • General Chairs:
                • Lizhu Zhou,
                • Tok Wang Ling,
                • Program Chair:
                • Beng Chin Ooi

                Copyright © 2007 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 11 June 2007

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader