Abstract
The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.
- Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 35--42.]] Google ScholarDigital Library
- Anh, V. N. and Moffat, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia. 291--298.]] Google ScholarDigital Library
- Anh, V. N. and Moffat, A. 2002. Impact transformation: Effective and efficient web retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 3--10.]] Google ScholarDigital Library
- Anh, V. N. and Moffat, A. 2005. Inverted index compression using word-aligned binary codes. Kluwer International Journal of Information Retrieval 8, 1, 151--166.]] Google ScholarDigital Library
- Arusu, A., Cho, J., Garcia-Molina, H., Paepcke, A., and Raghavan, S. 2001. Searching the Web. ACM Trans. Internet Technol. 1, 1, 2--43.]] Google ScholarDigital Library
- Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., and Ziviani, N. 2001. Distributed query processing using partitioned inverted files. In Proceedings of String Processing and Information Retrieval Symposium, Laguna de San Rafael, Chile. G. Navarro, Ed. IEEE Computer Society, 10--20.]]Google Scholar
- Baeza-Yates, R., Moffat, A., and Navarro, G. 2002. Searching large text collections. In Handbook of Massive Data Sets, J. Abello, P. Pardalos, and M. Resende, Eds. Kluwer Academic Publishers, Boston, MA. 195--244.]] Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press, New York, NY.]] Google ScholarDigital Library
- Baeza-Yates, R. A. and Navarro, G. 2000. Block addressing indices for approximate text retrieval. J. Amer. Soc. Inform. Science 51, 1, 69--82.]] Google ScholarDigital Library
- Barbará, D., Mehrotra, S., and Vallabhaneni, P. 1996. The Gold text indexing engine. In Proceedings of IEEE International Conference on Data Engineering, New Orleans, LA. S. Y. W. Su, Ed. IEEE Computer Society Press, Los Alamitos, CA. 172--179.]] Google ScholarDigital Library
- Barroso, L. A., Dean, J., and Hölzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 23, 2 (April), 22--28.]] Google ScholarDigital Library
- Bayer, R. and McCreight, R. 1972. Organization and maintenance of large ordered indexes. Acta Informatica 1, 173--189.]]Google ScholarDigital Library
- Beaulieu, M., Baeza-Yates, R., Myaeng, S. H., and Järvelin, K., Eds. 2002. Proceedings of the 25th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval. Tampere, Finland, ACM Press.]]Google Scholar
- Bell, T. C., Cleary, J. G., and Witten, I. H. 1990. Text Compression. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
- Bell, T. C., Moffat, A., Nevill-Manning, C. G., Witten, I. H., and Zobel, J. 1993. Data compression in full-text retrieval systems. J. Amer. Soc. Inform. Science 44, 9 (Oct.), 508--531.]] Google ScholarDigital Library
- Bertino, E., Ooi, B. C., Sacks-Davis, R., Tan, K.-L., Zobel, J., Shidlovsky, B., and Catania, B. 1997. Indexing Techniques for Advanced Database Systems. Kluwer Academic Publishers, Boston, MA.]] Google ScholarDigital Library
- Bird, R. M., Newsbaum, J. B., and Trefftzs, J. L. 1978. Text file inversion: An evaluation. In Proceedings of the 4th Workshop on Computer Architecture for Non-Numeric Processing. Blue Mountain Lake, NY, ACM Press, 42--50.]] Google ScholarDigital Library
- Bird, R. M., Tu, J. C., and Worthy, R. M. 1977. Associative/parallel processors for searching very large textual data bases. In Proceedings of the 3rd Non-Numeric Workshop. Syracuse, NY, ACM Press, 8--16.]] Google ScholarDigital Library
- Blandford, D. and Blelloch, G. 2002. Index compression through document reordering. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 342--351.]] Google ScholarDigital Library
- Bookstein, A. and Klein, S. T. 1990. Using bitmaps for medium sized information retrieval systems. Inform. Proces. Manag. 26, 525--533.]] Google ScholarDigital Library
- Bookstein, A. and Klein, S. T. 1991a. Compression of correlated bit-vectors. Inform. Syst. 16, 4, 387--400.]] Google ScholarDigital Library
- Bookstein, A. and Klein, S. T. 1991b. Flexible compression for bitmap sets. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 402--410.]]Google Scholar
- Bookstein, A. and Klein, S. T. 1991c. Generative models for bitmap sets with compression applications. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Chicago, IL. A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, Eds. ACM Press, 63--71.]] Google ScholarDigital Library
- Bookstein, A. and Klein, S. T. 1992. Models of bitmap generation: A systematic approach to bitmap compression. Inform. Proc. Manag. 28, 6, 735--748.]] Google ScholarDigital Library
- Bookstein, A., Klein, S. T., and Raita, T. 2000. Simple Bayesian model for bitmap compression. Kluwer Int. J. Inform. Retriev. 1, 4, 315--328.]] Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.]] Google ScholarDigital Library
- Brisaboa, N. R., Fariña, A., Navarro, G., and Esteller, M. F. 2003. (S,C)-dense coding: An optimized compression code for natural language text databases. In Proceedings of String Processing and Information Retrieval Symposium, Manaus, Brazil, M. A. Nascimento, Ed. Springer, 122--136.]]Google Scholar
- Brown, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, 30--38.]] Google ScholarDigital Library
- Brown, E. W., Callan, J. P., and Croft, W. B. 1994. Fast incremental indexing for full-text information retrieval. In Proceedings of the International Conference on Very Large Databases, Santiago, Chile, J. B. Bocca, M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 192--202.]] Google ScholarDigital Library
- Brown, E. W., Callan, J. P., Croft, W. B., Eliot, J., and Moss, B. 1994. Supporting full-text information retrieval with a persistent object store. In Proceedings of the International Conference on Advances in Database Technology (EDBT), Cambridge, UK, M. Jarke, J. A. B. Jr., and K. G. Jeffery, Eds. Springer, 365--378.]] Google ScholarDigital Library
- Buckley, C. and Lewit, A. F. 1985. Optimisation of inverted vector searches. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, 97--110.]] Google ScholarDigital Library
- Burkowski, F. J. 1990. Surrogate subsets: a free space management strategy for the index of a text retrieval system. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium, 211--226.]] Google ScholarDigital Library
- Büttcher, S. and Clarke, C. L. A. 2005. Indexing time vs. query time trade-offs in dynamic information retrieval systems. In Proceedings of the International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 317--318.]] Google ScholarDigital Library
- Cacheda, F., Plachouras, V., and Ounis, I. 2004. Performance analysis of distributed architectures to index one terabyte of text. In Proceedings of the European Conference on IR Research, Sunderland, UK, S. McDonald and J. Tait, Eds. 395--408. Lecture Notes in Computer Science, Springer, vol. 2997.]]Google Scholar
- Cahoon, B. and McKinley, K. S. 1996. Performance evaluation of a distributed architecture for information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 110--118.]] Google ScholarDigital Library
- Cahoon, B., McKinley, K. S., and Lu, Z. 2000. Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Trans. Inform. Syst. 18, 1 (Jan.) 1--43.]] Google ScholarDigital Library
- Can, F. 1994. On the efficiency of best-match cluster searches. Inform. Proc. Manag. 30, 3, 343--361.]] Google ScholarDigital Library
- Cardenas, A. 1975. Analysis and performance of inverted data base structures. Comm. ACM 18, 5, 253--263.]] Google ScholarDigital Library
- Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International Conference on Research and Development in Information Retrieval. New Orleans, LA. 43--50.]] Google ScholarDigital Library
- Choueka, Y., Fraenkel, A., Klein, S., and Segal, E. 1987. Improved techniques for processing queries in full-text systems. In Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, C. T. Yu and C. J. V. Rijsbergen, Eds. ACM Press, 306--315.]] Google ScholarDigital Library
- Choueka, Y., Fraenkel, A. S., and Klein, S. T. 1988. Compression of concordances in full-text retrieval systems. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Grenoble, France, Y. Chiaramella, Ed. ACM Press, 597--612.]] Google ScholarDigital Library
- Choueka, Y., Fraenkel, A. S., Klein, S. T., and Segal, E. 1986. Improved hierarchical bit-vector compression in document retrieval systems. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, 88--97.]] Google ScholarDigital Library
- Ciaccia, P., Tiberio, P., and Zezula, P. 1996. Declustering of key-based partitioned signature files. ACM Trans. Datab. Syst. 21, 3 (Sept.), 295--338.]] Google ScholarDigital Library
- Ciaccia, P. and Zezula, P. 1993. Estimating accesses in partitioned signature file organizations. ACM Trans. Inform. Syst. 11, 2, 133--142.]] Google ScholarDigital Library
- Clarke, C. L. A. and Cormack, G. V. 1995. Dynamic inverted indexes for a distributed full-text retrieval system. Tech. rep. MT-95-01, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]Google Scholar
- Clarke, C. L. A. and Cormack, G. V. 2000. Shortest-substring retrieval and ranking. ACM Trans. Inform. Syst. 18, 1, 44--78.]] Google ScholarDigital Library
- Clarke, C. L. A., Cormack, G. V., and Burkowski, F. J. 1994. Fast inverted indexes with on-line update. Tech. rep. CS-94-40, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]Google Scholar
- Clarke, C. L. A., Cormack, G. V., and Tudhope, E. A. 2000. Relevance ranking for one to three term queries. Inform. Proc. Manage. 36, 2, 291--311.]] Google ScholarDigital Library
- Couvreur, T. R., Benzel, R. N., Miller, S. F., Zeitler, D. N., Lee, D. L., Singhal, M., Shivaratri, N., and Wong, W. Y. P. 1994. An analysis of performance and cost factors in searching large text databases using parallel search systems. J. Amer. Soc. Inform. Science 45, 7, 443--464.]] Google ScholarDigital Library
- Cringean, J. K., England, R., Manson, G. A., and Willett, P. 1990. Parallel text searching in serial files using a processor farm. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Brussels, Belgium, 429--453.]] Google ScholarDigital Library
- Croft, W. B., Harper, D. J., Kraft, D. H., and Zobel, J., Eds. 2001. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. ACM Press.]]Google Scholar
- Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht, The Netherlands.]] Google ScholarDigital Library
- Croft, W. B., Moffat, A., van Rijsbergen, C. J., Wilkinson, R., and Zobel, J., Eds. 1998. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. ACM Press.]] Google Scholar
- Croft, W. B. and Savino, P. 1988. Implementing ranking strategies using text signatures. ACM Trans. Office Inform. Syst. 6, 1, 42--62.]] Google ScholarDigital Library
- Culpepper, J. S. and Moffat, A. 2005. Enhanced byte codes with restricted prefix properties. In Proceedings of the 12th International Symposium on String Processing and Information Retrieval. Buenos Aires, Argentina, M. P. Consens and G. Navarro, Eds. Lecture Notes in Computer Science, vol. 3772, Springer, 1--12.]] Google ScholarDigital Library
- Cutting, D. and Pedersen, J. 1990. Optimisations for dynamic inverted index maintenance. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. Brussels, Belgium, J.-L. Vidick, Ed. ACM Press, 405--411.]] Google ScholarDigital Library
- de Kretser, O. and Moffat, A. 1999. Effective document presentation with a locality-based similarity heuristic. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA. 113--120.]] Google ScholarDigital Library
- de Kretser, O. and Moffat, A. 2004. Seft: A search engine for text. Softw.---Prac. Exper. 34, 10 (Aug.), 1011--1023.]]Google Scholar
- de Kretser, O., Moffat, A., Shimmin, T., and Zobel, J. 1998. Methodologies for distributed information retrieval. In Proceedings of the IEEE International Conference on Distributed Computing Systems. Amsterdam, The Netherlands. M. P. Papazoglou, M. Takizawa, B. Krämer, and S. Chanson, Eds. IEEE Computer Society Press, Los Alamitos, CA. 66--73.]] Google ScholarDigital Library
- Eastman, C. 1983. Current practice in the evaluation of multikey search algorithms. In Proceedings of the 6th International ACM SIGIR Conference on Research and Development in Information Retrieval. Washington DC. J. J. Kuehn, Ed. ACM Press. 197--204.]] Google ScholarDigital Library
- Edmundson, H. P. and Wyllys, R. E. 1961. Automatic abstracting and indexing---survey and recommendations. Comm. ACM 4, 5 (May), 226--234.]] Google ScholarDigital Library
- Elias, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inform. Theory IT-21, 2 (March), 194--203.]]Google ScholarDigital Library
- Faloutsos, C. 1985a. Access methods for text. Comput. Surv. 17, 1, 49--74.]] Google ScholarDigital Library
- Faloutsos, C. 1985b. Signature files: Design and performance comparison of some signature extraction methods. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Montreal, Canada, 63--82.]] Google ScholarDigital Library
- Faloutsos, C. and Jagadish, H. V. 1992. Hybrid index organizations for text databases. In Proceedings of the International Conference on Extending Database Technology, Vienna, Austria, A. Pirotte, C. Delobel, and G. Gottlob, Eds. Lecture Notes in Computer Science, vol. 580, Springer, 310--327.]] Google ScholarDigital Library
- Faloutsos, C. and Oard, D. W. 1995. A survey of information retrieval and filtering methods. Tech. rep., University of Maryland Institute for Advanced Computer Studies Report, University of Maryland at College Park, MD.]] Google ScholarDigital Library
- Fox, E. A. and Lee, W. C. 1991. FAST-INV: A fast algorithm for building large inverted files. Tech. rep. TR 91--10, Virginia Polytechnic Institute and State University, Blacksburg, VA.]] Google ScholarDigital Library
- Fraenkel, A. S. and Klein, S. T. 1985. Novel compression of sparse bit-strings---Preliminary report. In Combinatorial Algorithms on Words, Volume 12, A. Apostolico and Z. Galil, Eds. NATO ASI Series F. Springer, Berlin, Germany, 169--183.]]Google Scholar
- Frakes, W. B. and Baeza-Yates, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
- Frei, H.-P., Harman, D., Schäuble, P., and Wilkinson, R., Eds. 1996. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland. ACM Press.]] Google Scholar
- Gallager, R. G. and Van Voorhis, D. C. 1975. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inform. Theory IT--21, 2 (March), 228--230.]]Google ScholarDigital Library
- Garcia, S., Williams, H. E., and Cannane, A. 2004. Access-ordered indexes. In Proceedings of the Australasian Computer Science Conference, Dunedin, New Zealand. V. Estivill-Castro, Ed. Australian Computer Society, 7--14.]] Google ScholarDigital Library
- Golomb, S. W. 1966. Run-length encodings. IEEE Trans. Inform. Theory IT--12, 3 (July), 399--401.]]Google ScholarDigital Library
- Grabs, T., Böhm, K., and Schek, H.-J. 2001. PowerDB-IR: Information retrieval on top of a database cluster. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Atlanta, GA. H. Paques, L. Liu, and D. Grossman, Eds. ACM Press, 411--418.]] Google ScholarDigital Library
- Grossman, D. A. and Frieder, O. 2004. Information Retrieval: Algorithms and Heuristics, 2nd Ed. Kluwer Academic Publishers, Dordrecht, The Netherlands.]] Google ScholarDigital Library
- Harman, D., McCoy, W., Toense, R., and Candela, G. 1991. Prototyping a distributed information retrieval system using statistical ranking. Inform. Proc. Manag. 27, 5, 449--460.]] Google ScholarDigital Library
- Harman, D. K. 1992. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms. W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, Englewood Cliffs, NJ. 362--392.]] Google ScholarDigital Library
- Harman, D. K. and Candela, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Amer. Soc. Inform. Science 41, 8 (Aug.), 581--589.]]Google ScholarCross Ref
- Harper, D. J. 1982. An evaluation of four information storage and retrieval packages. Tech. rep. 7, CSIRO Division of Computing Research, Canberra, Australia.]]Google Scholar
- Haskin, R. L. 1980. Hardware for searching very large text databases. In Proceedings of the Workshop on Computer Architecture for Non-Numeric Processing, Pacific Grove, CA, ACM Press, 49--56.]] Google ScholarDigital Library
- Hawking, D. 1997. Scalable text retrieval for large digital libraries. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy. C. Thanos, Ed. Lecture Notes in Computer Science, vol. 1324. Springer, 127--145.]] Google ScholarDigital Library
- Hawking, D. 1998. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum 32, 2, 16--22.]] Google ScholarDigital Library
- Heaps, H. S. 1978. Information Retrieval, Computational and Theoretical Aspects. Academic Press, New York, NY.]] Google ScholarDigital Library
- Hearst, M., Gey, F., and Tong, R., Eds. 1999. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco. CA, ACM Press.]] Google Scholar
- Heinz, S. and Zobel, J. 2003. Efficient single-pass index construction for text databases. J. Amer. Soc. Inform. Science Techn. 54, 8, 713--729.]] Google ScholarDigital Library
- Ivie, E. L. 1966. Search procedure based on measures of relatedness between documents. Ph.D. thesis. MIT, Cambridge, MA.]]Google Scholar
- Jakobsson, M. 1978. Huffman coding in bit-vector compression. Inform. Pro. Let. 7, 6 (Oct.) 304--307.]]Google Scholar
- Jeong, B. S. and Omiecinski, E. 1995. Inverted file partitioning schemes in multiple disk systems. IEEE Tran. Parall. Distrib. Syst. 6, 2, 142--153.]] Google ScholarDigital Library
- Jónsson, B. T., Franklin, M. J., and Srivastava, D. 1998. Interaction of query evaluation and buffer management for information retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data, Seattle, WA. ACM Press, 118--129.]] Google ScholarDigital Library
- Kaszkiel, M., Zobel, J., and Sacks-Davis, R. 1999. Efficient passage ranking for document databases. ACM Trans. Inform. Syst. 17, 4 (Oct.), 406--439.]] Google ScholarDigital Library
- Kent, A. J., Sacks-Davis, R., and Ramamohanarao, K. 1990. A signature file scheme based on multiple organisations for indexing very large text databases. J. Amer. Soc. Inform. Science 41, 7, 508--534.]]Google ScholarCross Ref
- Klein, S. T., Bookstein, A., and Deerwester, S. 1989. Storing text retrieval systems on CD-ROM: Compression and encryption considerations. ACM Trans. Office Inform. Syst. 7, 3, 230--245.]] Google ScholarDigital Library
- Kleinberg, J. M. 1999. Authoritative sources in a hyper-linked environment. J. ACM 46, 5, 604--632.]] Google ScholarDigital Library
- Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. Comput. Surv. 32, 2, 144--173.]] Google ScholarDigital Library
- Kocberbera, S. and Can, F. 1997. Vertical framing of superimposed signature files using partial evaluation of queries. Inform. Proc. Manag. 33, 3, 353--376.]] Google ScholarDigital Library
- Lancaster, F. W. and Fayen, E. G. 1973. Information Retrieval OnLine. Melville, Los Angeles, CA.]]Google Scholar
- Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Tran. Knowl. Data Eng. 7, 3 (June) 423--435.]] Google ScholarDigital Library
- Lee, D. L. and Leng, C.-W. 1989. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Inform. Syst. 7, 2, 158--180.]] Google ScholarDigital Library
- Lee, Y. K., Yoo, S. J., Yoon, K., and Berra, P. B. 1996. Index structures for structured documents. In Proceedings of the ACM Digital Libraries, Bethesda, MD, E. A. Fox and G. Marchionini, Eds. ACM Press, 91--99.]] Google ScholarDigital Library
- Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the World Wide Web Conference. Budapest, Hungary. ACM Press, 19--28.]] Google ScholarDigital Library
- Lempel, R. and Moran, S. 2004. Optimal result prefetching in web search engines with segmented indices. ACM Trans. Internet Techn. 4, 1, 31--59.]] Google ScholarDigital Library
- Lempel, R. and Moran, S. 2005. Competitive caching of query results in search engines. Theoret. Comput. Science 324, 253--271.]] Google ScholarDigital Library
- Lester, N., Moffat, A., Webber, W., and Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of the 6th International Conference on Web Informations Systems. Lecture Notes in Computer Science, vol. 3806, Springer, 470--477.]] Google ScholarDigital Library
- Lester, N., Moffat, A., and Zobel, J. 2005. Fast on-line index construction by geometric partitioning. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 776--783.]] Google ScholarDigital Library
- Lester, N., Zobel, J., and Williams, H. E. 2006. Efficient online index maintenance for text retrieval systems. Inform. Proce. Manag. To appear.]] Google ScholarDigital Library
- Lim, L., Wang, M., Padmanabhan, S., Vitter, J. S., and Agarwal, R. 2003. Dynamic maintenance of web indexes using landmarks. In Proceedings of the World-Wide Web Conference, Budapest, Hungary. Y.-F. R. Chen, L. Kovács, and S. Lawrence, Eds. ACM Press, 102--111.]] Google ScholarDigital Library
- Linoff, G. and Stanfill, C. 1993. Compression of indexes with full positional information in very large text databases. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburg, PA. R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, 88--97.]] Google ScholarDigital Library
- Lu, Z. and McKinley, K. S. 2003. Partial collection replication for information retrieval. Kluwer Int. J. Inform. Retrie. 6, 2, 159--198.]] Google ScholarDigital Library
- Lucarella, D. 1988. A document retrieval system based upon nearest neighbour searching. J. Inform. Science 14, 25--33.]] Google ScholarDigital Library
- Luhn, H. P. 1957. A statistical approach to mechanised encoding and searching of library information. IBM J. Resear. Develop. 1, 309--317.]]Google ScholarDigital Library
- MacFarlane, A., McCann, J. A., and Robertson, S. E. 2000. Parallel search using partitioned inverted files. In Proceedings of the String Processing and Information Retrieval Symposium. A Coruña, Spain. P. de la Fuente, Ed. IEEE Computer Society Press, Los Alamitos, CA. 209--220.]] Google ScholarDigital Library
- Macleod, I. A., Martin, T. P., Nordin, B., and Phillips, J. R. 1987. Strategies for building distributed information retrieval systems. Inform. Proc. Manag. 23, 6, 511--528.]] Google ScholarDigital Library
- Manber, U. and Wu, S. 1994. GLIMPSE: a tool to search through entire file systems. In USENIX Winter Technical Conference. San Francisco CA. USENIX Association, Berkeley, CA. 23--32.]] Google ScholarDigital Library
- Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press Cambridge, MA.]] Google ScholarDigital Library
- Maron, M. E. and Kuhns, J. L. 1960. On relevance, probabilistic indexing and information retrieval. J. ACM 7, 3, 216--244.]] Google ScholarDigital Library
- Martin, T. P., Macleod, I. A., Russell, J. I., Leese, K., and Foster, B. 1990. A case study of caching strategies for a distributed full text retrieval system. Inform. Proc. Manag. 26, 2, 227--247.]] Google ScholarDigital Library
- Martin, T. P. and Russell, J. I. 1991. Data caching strategies for distributed full text retrieval systems. Inform. Syst. 16, 1, 1--11.]] Google ScholarDigital Library
- Matthew, F. W. and Thomson, L. 1967. Weighted term search: a computer program for an inverted coordinate search on magnetic tape. J. Chem. Document. 7, 1, 49--56.]]Google ScholarCross Ref
- McDonell, K. J. 1977. An inverted index implementation. Comput. J. 20, 1, 116--123.]]Google ScholarCross Ref
- Melnik, S., Raghavan, S., Yang, B., and Garcia-Molina, H. 2001. Building a distributed full-text index for the web. ACM Trans. Inform. Syst. 19, 3, 217--241.]] Google ScholarDigital Library
- Moffat, A. 1992. Economical inversion of large text files. Comput. Syst. 5, 2, 125--139.]]Google Scholar
- Moffat, A. and Bell, T. A. H. 1995. In-situ generation of compressed inverted files. J. Ame. Soci. Inform. Science 46, 7 (Aug.) 537--550.]] Google ScholarDigital Library
- Moffat, A. and Stuiver, L. 2000. Binary interpolative coding for effective index compression. Kluwer Int. J. Inform. Retriev. 3, 1 (July) 25--47.]] Google ScholarDigital Library
- Moffat, A. and Turpin, A. 2002. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA.]] Google ScholarDigital Library
- Moffat, A., Webber, W., Zobel, J., and Baeza-Yates, R. 2005. A pipelined architecture for distributed text query evaluation. Submitted for publication.]]Google Scholar
- Moffat, A. and Zobel, J. 1992a. Coding for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA, 72--81.]]Google Scholar
- Moffat, A. and Zobel, J. 1992b. Parameterised compression for sparse bitmaps. In Proceedings of the 5th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmaaank, N. J. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, 274--285.]] Google ScholarDigital Library
- Moffat, A. and Zobel, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Informa. Syst. 14, 4 (Oct.) 349--379.]] Google ScholarDigital Library
- Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the 5th International Conference on Web Informations Systems, Brisbane, Australia. X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, 1--12.]]Google Scholar
- Moffat, A., Zobel, J., and Sacks-Davis, R. 1994. Memory efficient ranking. Inform. Proc. Manag. 30, 6 (Nov.) 733--744.]] Google ScholarDigital Library
- Motzkin, D. 1994. On high performance of updates within an efficient document retrieval system. Inform. Proc. Manag. 30, 1, 93--118.]] Google ScholarDigital Library
- Navarro, G., de Moura, E., Neubert, M., Ziviani, N., and Baeza-Yates, R. 2000. Adding compression to block addressing inverted indexes. Kluwer Int. J. Inform. Retriev. 3, 1, 49--77.]] Google ScholarDigital Library
- Noreault, T., Koll, M., and McGill, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Amer, Soc. Inform. Science 28, 333--339.]]Google ScholarCross Ref
- Perry, S. A. and Willett, P. 1983. A review of the use of inverted files for best match searching in information retrieval systems. J. Inform. Science 6, 59--66.]]Google ScholarDigital Library
- Persin, M., Zobel, J., and Sacks-Davis, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Amer. Soc. Inform. Science 47, 10, 749--764.]] Google ScholarDigital Library
- Ponte, J. M. and Croft, W. B. 1998. A language modelling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 275--281.]] Google ScholarDigital Library
- Rabitti, F., Ed. 1986. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, ACM Press.]] Google Scholar
- Reddaway, S. F. 1991. High speed text retrieval from large databases on a massively parallel processor. Inform. Proc. Manag. 27, 4, 311--316.]] Google ScholarDigital Library
- Ribeiro-Neto, B. and Barbosa, R. 1998. Query performance for tightly coupled distributed digital libraries. In Proceedings of the ACM Digital Libraries, Pittsburgh, PA, I. Witten, R. Akscyn, and F. M. S. III, Eds. ACM Press, 182--190.]] Google ScholarDigital Library
- Riberto-Neto, B., de Moura, E. S., Neubert, M. S., and Ziviani, N. 1999. Efficient distributed algorithms to build inverted files. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA, 105--112.]] Google ScholarDigital Library
- Rice, R. F. 1979. Some practical universal noiseless coding techniques. Tech. Rep. 79--22, Jet Propulsion Laboratory, Pasadena, CA.]]Google Scholar
- Robertson, S. E. 1977. The probability ranking principle in IR. J. Document. 33, 4 (Dec.) 294--304.]]Google ScholarCross Ref
- Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. 1994. Okapi at TREC-3. In Overview of the 3rd TREC Text REtrieval Conference, Gaithersburg, MD, D. Harman, Ed. NIST, NIST Special Publication 500-226.]]Google Scholar
- Rogers, W., Candela, G., and Harman, D. 1995. Space and time improvements for indexing in information retrieval. In Proceedings of the Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, L. Spitz and D. D. Lewis, Eds.]]Google Scholar
- Sacks-Davis, R., Kent, A. J., and Ramamohanarao, K. 1987. Multi-key access methods based on superimposed coding techniques. ACM Trans. Datab. Syst. 12, 4, 655--696.]] Google ScholarDigital Library
- Salomon, D. 2000. Data Compression: The Complete Reference, 2nd Ed. Springer, Berlin, Germany.]] Google ScholarDigital Library
- Salton, G. 1962. The use of citations as an aid to automatic content analysis. Tech. Rep. ISR-2, Section III, Harvard Computation Laboratory, Cambridge, MA.]]Google Scholar
- Salton, G. 1968. Automatic Index Organization and Retrieval. McGraw-Hill, New York, NY.]] Google ScholarDigital Library
- Salton, G., Ed. 1971. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
- Salton, G. 1972. Dynamic document processing. Comm. ACM 15, 7, 658--668.]] Google ScholarDigital Library
- Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.]] Google ScholarDigital Library
- Salton, G. and Buckley, C. 1988a. Parallel text search methods. Comm. ACM 31, 2 (Feb.) 202--215.]] Google ScholarDigital Library
- Salton, G. and Buckley, C. 1988b. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manag. 24, 5, 513--523.]] Google ScholarDigital Library
- Salton, G. and McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY.]] Google ScholarDigital Library
- Salton, G., Wong, A., and Wang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11 (Nov.) 613--620.]] Google ScholarDigital Library
- Saraiva, P. C., de Moura, E. S., Ziviani, N., Fonseca, R., Meira, W., Murta, C., and Ribeiro-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 51--58.]] Google ScholarDigital Library
- Sayood, K. 2000. Introduction to Data Compression 2nd Ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarDigital Library
- Scholer, F., Williams, H. E., Yiannis, J., and Zobel, J. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 222--229.]] Google ScholarDigital Library
- Schuegraf, E. J. 1976. Compression of large inverted files with hyperbolic term distribution. Inform. Proc. Manag. 12, 377--384.]]Google ScholarCross Ref
- Segesta, J. and Reid-Green, K. 2002. Harley Tillitt and computerized library searching. IEEE Ann. History Comput. 24, 3 (Sept.) 23--34.]] Google ScholarDigital Library
- Severance, D. G. and Carlis, J. V. 1977. A practical approach to selecting record access paths. Comput. Surv. 9, 4, 259--272.]] Google ScholarDigital Library
- Shieh, W.-Y., Chen, T.-F., and Chung, C.-P. 2003. A tree-based inverted file for fast ranked-document retrieval. In Proceedings of the International Conference on Information and Knowledge Engineering. Las Vegas, NV. H. R. Arabnia, Ed. CSREA Press, 64--69.]]Google Scholar
- Shieh, W.-Y., Chen, T.-F., Shann, J. J.-J., and Chung, C.-P. 2003. Inverted file compression through document identifier reassignment. Inform. Proc. Manag. 39, 1, 117--131.]] Google ScholarDigital Library
- Shieh, W.-Y. and Chung, C.-P. 2005. A statistics-based approach to incrementally update inverted files. Inform. Proc. Manag. 41, 2, 275--288.]] Google ScholarDigital Library
- Shieh, W.-Y., Shann, J. J.-J., and Chung, C.-P. 2003. An inverted file cache for fast information retrieval. J. Inform. Science Eng. 19, 4, 681--695.]]Google Scholar
- Shoens, K., Tomasic, A., and García-Molina, H. 1994. Synthetic workload performance analysis of incremental updates. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, W. B. Croft and C. J. van Rijsbergen, Eds. ACM Press, 329--338.]] Google ScholarDigital Library
- Silvestri, F., Orlando, S., and Perego, R. 2004. Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, England, M. Sanderson, K. Järvelin, J. Allan, and P. Bruza, Eds. ACM Press, 305--312.]] Google ScholarDigital Library
- Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 21--29.]] Google ScholarDigital Library
- Smeaton, A. and van Rijsbergen, C. J. 1981. The nearest neighbour problem in information retrieval. In Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Oakland, CA, C. J. Crouch, Ed. ACM Press, 83--87.]] Google ScholarDigital Library
- Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. parts 1&2. Inform. Proc. Manag. 36, 6, 779--840.]] Google ScholarDigital Library
- Sparck Jones, K. and Willett, P., Eds. 1997. Readings in Information Retrieval. Academic Press/Morgan Kaufmann, San Francisco, CA.]] Google ScholarDigital Library
- Spink, A., Wolfram, D., Jansen, B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soci. Inform. Science 52, 3, 226--234.]] Google ScholarDigital Library
- Spink, A. and Xu, J. L. 2000. Selected results from a large study of web searching: The Excite study. Inform. Resear.---Int. Electron. J. 6, 1.]]Google Scholar
- Stanfill, C. 1990. Partitioned posting files: a parallel inverted file structure for information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. 413--428.]] Google ScholarDigital Library
- Stanfill, C., Thau, R., and Waltz, D. 1989. A parallel indexed algorithm for information retrieval. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Cambridge, MA, N. J. Belkin and C. J. van Rijsbergen, Eds. ACM Press, 88--97.]] Google ScholarDigital Library
- Stellhorn, W. H. 1977. An inverted file processor for information retrieval. IEEE Trans. Comput. 26, 12, 1258--1267.]]Google ScholarDigital Library
- Strohman, T., Turtle, H., and Croft, W. B. 2005. Optimization strategies for complex queries. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil, G. Marchionini, A. Moffat, J. Tate, R. Baeza-Yates, and N. Ziviani, Eds. ACM Press, 219--225.]] Google ScholarDigital Library
- Tague, J., Ed. 1985. Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, ACM Press.]] Google ScholarDigital Library
- Teuhola, J. 1978. A compression method for clustered bit-vectors. Inform. Proc. Lett. 7, 6 (Oct.) 308--311.]]Google ScholarCross Ref
- Tomasic, A. and García-Molina, H. 1993. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the International Conference on Parallel and Distributed Information Systems. San Diego, CA, M. J. Carey and P. Valduriez, Eds. IEEE Computer Society Press, 8--17.]] Google ScholarDigital Library
- Tomasic, A. and García-Molina, H. 1996. Performance issues in distributed shared-nothing information-retrieval systems. Inform. Proc. Manag. 32, 6, 647--665.]] Google ScholarDigital Library
- Tomasic, A., García-Molina, H., and Shoens, K. 1994. Incremental updates of inverted lists for text document retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data. Minneapolis, MA, R. T. Snodgrass and M. Winslett, Eds. ACM Press, 289--300.]] Google ScholarDigital Library
- Trotman, A. 2003. Compressing inverted files. Kluwer Int. J. Inform. Retriev. 6, 5--19.]] Google ScholarDigital Library
- Turtle, H. and Flood, J. 1995. Query evaluation: strategies and optimizations. Inform. Proc. Manag. 31, 1 (Nov.), 831--850.]] Google ScholarDigital Library
- van Rijsbergen, C. J. 1979. Informat. Retriev., 2nd Ed. Butterworths, London, UK.]]Google Scholar
- Vasanthakumar, S. R., Callan, J. P., and Croft, W. B. 1996. Integrating INQUERY with an RDBMS to support text retrieval. Bull. Techn. Comm. Data Eng. 19, 1, 24--33.]]Google Scholar
- Vidick, J. L., Ed. 1990. Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. ACM Press,]] Google Scholar
- Voorhees, E. M. 1986. The efficiency of inverted index and cluster searches. In Proceedings of the 9th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy. 164--174.]] Google ScholarDigital Library
- Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiment and evaluation in information retrieval. MIT Press, Cambridge, MA.]] Google ScholarDigital Library
- Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.]]Google ScholarCross Ref
- Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference. Auckland, New Zealand. M. Orlowska, Ed. Australian Computer Society, 141--152.]]Google Scholar
- Williams, H. E., Zobel, J., and Bahle, D. 2004. Fast phrase querying with combined indexes. ACM Trans. Inform. Syst. 22, 4, 573--594.]] Google ScholarDigital Library
- Witten, I. H., Bell, T. C., and Nevill, C. G. 1991. Models for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference. Snowbird, UT, J. A. Storer and J. H. Reif, Eds. IEEE Computer Society Press, Los Alamitos, CA. 23--32.]]Google Scholar
- Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd Ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarDigital Library
- Wong, W. Y. P. and Lee, D. K. 1993. Implementations of partial document ranking using inverted files. Inform. Proc. Manag. 29, 5 (Sept.), 647--669.]] Google ScholarDigital Library
- Xi, W., Sornil, O., and Fox, E. A. 2002a. Hybrid partition inverted files for large-scale digital libraries. In Proceedings of the Digital Library: IT Opportunities and Challenges in the New Millennium. Beijing Library Press, Beijing, China, 404--418.]]Google Scholar
- Xi, W., Sornil, O., Luo, M., and Fox, E. A. 2002b. Hybrid partition inverted files: Experimental validation. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. Rome, Italy, M. Agosti and C. Thanos, Eds. Lecture Notes in Computer Science, vol. 2458, Springer, 422--413.]] Google ScholarDigital Library
- Zezula, P., Rabitti, F., and Tiberio, P. 1991. Dynamic partitioning of signature files. ACM Tran. Inform. Syst. 9, 4 (Oct.), 336--369.]] Google ScholarDigital Library
- Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1, 18--34.]] Google ScholarDigital Library
- Zobel, J., Moffat, A., and Ramamohanarao, K. 1996. Guidelines for presentation and comparison of indexing techniques. SIGMOD Record 25, 3 (Oct.), 10--15.]] Google ScholarDigital Library
- Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Datab. Syst. 23, 4 (Dec.), 453--490.]] Google ScholarDigital Library
- Zobel, J., Moffat, A., and Sacks-Davis, R. 1992. An efficient indexing technique for full-text database systems. In Proc. VLDB Int. Conf. on Very Large Databases, L.-Y. Yuan, Ed. Morgan Kaufmann, Vancouver, 352--362.]] Google ScholarDigital Library
- Zobel, J., Moffat, A., and Sacks-Davis, R. 1993a. Searching large lexicons for partially specified terms using compressed inverted files. In Proceedings of the International Conference on Very Large Databases. Dublin, Ireland, R. Agrawal, S. Baker, and D. Bell, Eds. Morgan Kaufmann, 290--301.]] Google ScholarDigital Library
- Zobel, J., Moffat, A., and Sacks-Davis, R. 1993b. Storage management for files of dynamic records. In Proceedings of the Australasian Database Conference. Brisbane, Australia, 26--38.]]Google Scholar
- Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. 1995. Efficient retrieval of partial documents. Inform. Proc. Manag. 31, 3, 361--377.]] Google ScholarDigital Library
Index Terms
- Inverted files for text search engines
Recommendations
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Optimizing search engines results using linear programming
When a query is passed to multiple search engines, each search engine returns a ranked list of documents. Researchers have demonstrated that combining results, in the form of a ''metasearch engine'', produces a significant improvement in coverage and ...
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and ProtectionThis paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Comments