skip to main content
article

Inverted files for text search engines

Published:25 July 2006Publication History
Skip Abstract Section

Abstract

The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.

References

  1. Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 35--42.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anh, V. N. and Moffat, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia. 291--298.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anh, V. N. and Moffat, A. 2002. Impact transformation: Effective and efficient web retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 3--10.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anh, V. N. and Moffat, A. 2005. Inverted index compression using word-aligned binary codes. Kluwer International Journal of Information Retrieval 8, 1, 151--166.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arusu, A., Cho, J., Garcia-Molina, H., Paepcke, A., and Raghavan, S. 2001. Searching the Web. ACM Trans. Internet Technol. 1, 1, 2--43.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., and Ziviani, N. 2001. Distributed query processing using partitioned inverted files. In Proceedings of String Processing and Information Retrieval Symposium, Laguna de San Rafael, Chile. G. Navarro, Ed. IEEE Computer Society, 10--20.]]Google ScholarGoogle Scholar
  7. Baeza-Yates, R., Moffat, A., and Navarro, G. 2002. Searching large text collections. In Handbook of Massive Data Sets, J. Abello, P. Pardalos, and M. Resende, Eds. Kluwer Academic Publishers, Boston, MA. 195--244.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Baeza-Yates, R. A. and Navarro, G. 2000. Block addressing indices for approximate text retrieval. J. Amer. Soc. Inform. Science 51, 1, 69--82.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Barbará, D., Mehrotra, S., and Vallabhaneni, P. 1996. The Gold text indexing engine. In Proceedings of IEEE International Conference on Data Engineering, New Orleans, LA. S. Y. W. Su, Ed. IEEE Computer Society Press, Los Alamitos, CA. 172--179.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Barroso, L. A., Dean, J., and Hölzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 23, 2 (April), 22--28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bayer, R. and McCreight, R. 1972. Organization and maintenance of large ordered indexes. Acta Informatica 1, 173--189.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Beaulieu, M., Baeza-Yates, R., Myaeng, S. H., and Järvelin, K., Eds. 2002. Proceedings of the 25th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval. Tampere, Finland, ACM Press.]]Google ScholarGoogle Scholar
  14. Bell, T. C., Cleary, J. G., and Witten, I. H. 1990. Text Compression. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bell, T. C., Moffat, A., Nevill-Manning, C. G., Witten, I. H., and Zobel, J. 1993. Data compression in full-text retrieval systems. J. Amer. Soc. Inform. Science 44, 9 (Oct.), 508--531.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bertino, E., Ooi, B. C., Sacks-Davis, R., Tan, K.-L., Zobel, J., Shidlovsky, B., and Catania, B. 1997. Indexing Techniques for Advanced Database Systems. Kluwer Academic Publishers, Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bird, R. M., Newsbaum, J. B., and Trefftzs, J. L. 1978. Text file inversion: An evaluation. In Proceedings of the 4th Workshop on Computer Architecture for Non-Numeric Processing. Blue Mountain Lake, NY, ACM Press, 42--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bird, R. M., Tu, J. C., and Worthy, R. M. 1977. Associative/parallel processors for searching very large textual data bases. In Proceedings of the 3rd Non-Numeric Workshop. Syracuse, NY, ACM Press, 8--16.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Blandford, D. and Blelloch, G. 2002. Index compression through document reordering. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 342--351.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bookstein, A. and Klein, S. T. 1990. Using bitmaps for medium sized information retrieval systems. Inform. Proces. Manag. 26, 525--533.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bookstein, A. and Klein, S. T. 1991a. Compression of correlated bit-vectors. Inform. Syst. 16, 4, 387--400.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bookstein, A. and Klein, S. T. 1991b. Flexible compression for bitmap sets. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 402--410.]]Google ScholarGoogle Scholar
  23. Bookstein, A. and Klein, S. T. 1991c. Generative models for bitmap sets with compression applications. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Chicago, IL. A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, Eds. ACM Press, 63--71.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bookstein, A. and Klein, S. T. 1992. Models of bitmap generation: A systematic approach to bitmap compression. Inform. Proc. Manag. 28, 6, 735--748.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bookstein, A., Klein, S. T., and Raita, T. 2000. Simple Bayesian model for bitmap compression. Kluwer Int. J. Inform. Retriev. 1, 4, 315--328.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Brisaboa, N. R., Fariña, A., Navarro, G., and Esteller, M. F. 2003. (S,C)-dense coding: An optimized compression code for natural language text databases. In Proceedings of String Processing and Information Retrieval Symposium, Manaus, Brazil, M. A. Nascimento, Ed. Springer, 122--136.]]Google ScholarGoogle Scholar
  28. Brown, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, 30--38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Brown, E. W., Callan, J. P., and Croft, W. B. 1994. Fast incremental indexing for full-text information retrieval. In Proceedings of the International Conference on Very Large Databases, Santiago, Chile, J. B. Bocca, M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 192--202.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Brown, E. W., Callan, J. P., Croft, W. B., Eliot, J., and Moss, B. 1994. Supporting full-text information retrieval with a persistent object store. In Proceedings of the International Conference on Advances in Database Technology (EDBT), Cambridge, UK, M. Jarke, J. A. B. Jr., and K. G. Jeffery, Eds. Springer, 365--378.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Buckley, C. and Lewit, A. F. 1985. Optimisation of inverted vector searches. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, 97--110.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Burkowski, F. J. 1990. Surrogate subsets: a free space management strategy for the index of a text retrieval system. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium, 211--226.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Büttcher, S. and Clarke, C. L. A. 2005. Indexing time vs. query time trade-offs in dynamic information retrieval systems. In Proceedings of the International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 317--318.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cacheda, F., Plachouras, V., and Ounis, I. 2004. Performance analysis of distributed architectures to index one terabyte of text. In Proceedings of the European Conference on IR Research, Sunderland, UK, S. McDonald and J. Tait, Eds. 395--408. Lecture Notes in Computer Science, Springer, vol. 2997.]]Google ScholarGoogle Scholar
  35. Cahoon, B. and McKinley, K. S. 1996. Performance evaluation of a distributed architecture for information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 110--118.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cahoon, B., McKinley, K. S., and Lu, Z. 2000. Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Trans. Inform. Syst. 18, 1 (Jan.) 1--43.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Can, F. 1994. On the efficiency of best-match cluster searches. Inform. Proc. Manag. 30, 3, 343--361.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Cardenas, A. 1975. Analysis and performance of inverted data base structures. Comm. ACM 18, 5, 253--263.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International Conference on Research and Development in Information Retrieval. New Orleans, LA. 43--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Choueka, Y., Fraenkel, A., Klein, S., and Segal, E. 1987. Improved techniques for processing queries in full-text systems. In Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, C. T. Yu and C. J. V. Rijsbergen, Eds. ACM Press, 306--315.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Choueka, Y., Fraenkel, A. S., and Klein, S. T. 1988. Compression of concordances in full-text retrieval systems. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Grenoble, France, Y. Chiaramella, Ed. ACM Press, 597--612.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Choueka, Y., Fraenkel, A. S., Klein, S. T., and Segal, E. 1986. Improved hierarchical bit-vector compression in document retrieval systems. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, 88--97.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ciaccia, P., Tiberio, P., and Zezula, P. 1996. Declustering of key-based partitioned signature files. ACM Trans. Datab. Syst. 21, 3 (Sept.), 295--338.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ciaccia, P. and Zezula, P. 1993. Estimating accesses in partitioned signature file organizations. ACM Trans. Inform. Syst. 11, 2, 133--142.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Clarke, C. L. A. and Cormack, G. V. 1995. Dynamic inverted indexes for a distributed full-text retrieval system. Tech. rep. MT-95-01, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]Google ScholarGoogle Scholar
  46. Clarke, C. L. A. and Cormack, G. V. 2000. Shortest-substring retrieval and ranking. ACM Trans. Inform. Syst. 18, 1, 44--78.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Clarke, C. L. A., Cormack, G. V., and Burkowski, F. J. 1994. Fast inverted indexes with on-line update. Tech. rep. CS-94-40, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]Google ScholarGoogle Scholar
  48. Clarke, C. L. A., Cormack, G. V., and Tudhope, E. A. 2000. Relevance ranking for one to three term queries. Inform. Proc. Manage. 36, 2, 291--311.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Couvreur, T. R., Benzel, R. N., Miller, S. F., Zeitler, D. N., Lee, D. L., Singhal, M., Shivaratri, N., and Wong, W. Y. P. 1994. An analysis of performance and cost factors in searching large text databases using parallel search systems. J. Amer. Soc. Inform. Science 45, 7, 443--464.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Cringean, J. K., England, R., Manson, G. A., and Willett, P. 1990. Parallel text searching in serial files using a processor farm. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Brussels, Belgium, 429--453.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Croft, W. B., Harper, D. J., Kraft, D. H., and Zobel, J., Eds. 2001. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. ACM Press.]]Google ScholarGoogle Scholar
  52. Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht, The Netherlands.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Croft, W. B., Moffat, A., van Rijsbergen, C. J., Wilkinson, R., and Zobel, J., Eds. 1998. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. ACM Press.]] Google ScholarGoogle Scholar
  54. Croft, W. B. and Savino, P. 1988. Implementing ranking strategies using text signatures. ACM Trans. Office Inform. Syst. 6, 1, 42--62.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Culpepper, J. S. and Moffat, A. 2005. Enhanced byte codes with restricted prefix properties. In Proceedings of the 12th International Symposium on String Processing and Information Retrieval. Buenos Aires, Argentina, M. P. Consens and G. Navarro, Eds. Lecture Notes in Computer Science, vol. 3772, Springer, 1--12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Cutting, D. and Pedersen, J. 1990. Optimisations for dynamic inverted index maintenance. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. Brussels, Belgium, J.-L. Vidick, Ed. ACM Press, 405--411.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. de Kretser, O. and Moffat, A. 1999. Effective document presentation with a locality-based similarity heuristic. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA. 113--120.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. de Kretser, O. and Moffat, A. 2004. Seft: A search engine for text. Softw.---Prac. Exper. 34, 10 (Aug.), 1011--1023.]]Google ScholarGoogle Scholar
  59. de Kretser, O., Moffat, A., Shimmin, T., and Zobel, J. 1998. Methodologies for distributed information retrieval. In Proceedings of the IEEE International Conference on Distributed Computing Systems. Amsterdam, The Netherlands. M. P. Papazoglou, M. Takizawa, B. Krämer, and S. Chanson, Eds. IEEE Computer Society Press, Los Alamitos, CA. 66--73.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Eastman, C. 1983. Current practice in the evaluation of multikey search algorithms. In Proceedings of the 6th International ACM SIGIR Conference on Research and Development in Information Retrieval. Washington DC. J. J. Kuehn, Ed. ACM Press. 197--204.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Edmundson, H. P. and Wyllys, R. E. 1961. Automatic abstracting and indexing---survey and recommendations. Comm. ACM 4, 5 (May), 226--234.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Elias, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inform. Theory IT-21, 2 (March), 194--203.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Faloutsos, C. 1985a. Access methods for text. Comput. Surv. 17, 1, 49--74.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Faloutsos, C. 1985b. Signature files: Design and performance comparison of some signature extraction methods. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Montreal, Canada, 63--82.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Faloutsos, C. and Jagadish, H. V. 1992. Hybrid index organizations for text databases. In Proceedings of the International Conference on Extending Database Technology, Vienna, Austria, A. Pirotte, C. Delobel, and G. Gottlob, Eds. Lecture Notes in Computer Science, vol. 580, Springer, 310--327.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Faloutsos, C. and Oard, D. W. 1995. A survey of information retrieval and filtering methods. Tech. rep., University of Maryland Institute for Advanced Computer Studies Report, University of Maryland at College Park, MD.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Fox, E. A. and Lee, W. C. 1991. FAST-INV: A fast algorithm for building large inverted files. Tech. rep. TR 91--10, Virginia Polytechnic Institute and State University, Blacksburg, VA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Fraenkel, A. S. and Klein, S. T. 1985. Novel compression of sparse bit-strings---Preliminary report. In Combinatorial Algorithms on Words, Volume 12, A. Apostolico and Z. Galil, Eds. NATO ASI Series F. Springer, Berlin, Germany, 169--183.]]Google ScholarGoogle Scholar
  69. Frakes, W. B. and Baeza-Yates, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Frei, H.-P., Harman, D., Schäuble, P., and Wilkinson, R., Eds. 1996. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland. ACM Press.]] Google ScholarGoogle Scholar
  71. Gallager, R. G. and Van Voorhis, D. C. 1975. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inform. Theory IT--21, 2 (March), 228--230.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Garcia, S., Williams, H. E., and Cannane, A. 2004. Access-ordered indexes. In Proceedings of the Australasian Computer Science Conference, Dunedin, New Zealand. V. Estivill-Castro, Ed. Australian Computer Society, 7--14.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Golomb, S. W. 1966. Run-length encodings. IEEE Trans. Inform. Theory IT--12, 3 (July), 399--401.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Grabs, T., Böhm, K., and Schek, H.-J. 2001. PowerDB-IR: Information retrieval on top of a database cluster. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Atlanta, GA. H. Paques, L. Liu, and D. Grossman, Eds. ACM Press, 411--418.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Grossman, D. A. and Frieder, O. 2004. Information Retrieval: Algorithms and Heuristics, 2nd Ed. Kluwer Academic Publishers, Dordrecht, The Netherlands.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Harman, D., McCoy, W., Toense, R., and Candela, G. 1991. Prototyping a distributed information retrieval system using statistical ranking. Inform. Proc. Manag. 27, 5, 449--460.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Harman, D. K. 1992. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms. W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, Englewood Cliffs, NJ. 362--392.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Harman, D. K. and Candela, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Amer. Soc. Inform. Science 41, 8 (Aug.), 581--589.]]Google ScholarGoogle ScholarCross RefCross Ref
  79. Harper, D. J. 1982. An evaluation of four information storage and retrieval packages. Tech. rep. 7, CSIRO Division of Computing Research, Canberra, Australia.]]Google ScholarGoogle Scholar
  80. Haskin, R. L. 1980. Hardware for searching very large text databases. In Proceedings of the Workshop on Computer Architecture for Non-Numeric Processing, Pacific Grove, CA, ACM Press, 49--56.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Hawking, D. 1997. Scalable text retrieval for large digital libraries. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy. C. Thanos, Ed. Lecture Notes in Computer Science, vol. 1324. Springer, 127--145.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Hawking, D. 1998. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum 32, 2, 16--22.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Heaps, H. S. 1978. Information Retrieval, Computational and Theoretical Aspects. Academic Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Hearst, M., Gey, F., and Tong, R., Eds. 1999. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco. CA, ACM Press.]] Google ScholarGoogle Scholar
  85. Heinz, S. and Zobel, J. 2003. Efficient single-pass index construction for text databases. J. Amer. Soc. Inform. Science Techn. 54, 8, 713--729.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Ivie, E. L. 1966. Search procedure based on measures of relatedness between documents. Ph.D. thesis. MIT, Cambridge, MA.]]Google ScholarGoogle Scholar
  87. Jakobsson, M. 1978. Huffman coding in bit-vector compression. Inform. Pro. Let. 7, 6 (Oct.) 304--307.]]Google ScholarGoogle Scholar
  88. Jeong, B. S. and Omiecinski, E. 1995. Inverted file partitioning schemes in multiple disk systems. IEEE Tran. Parall. Distrib. Syst. 6, 2, 142--153.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Jónsson, B. T., Franklin, M. J., and Srivastava, D. 1998. Interaction of query evaluation and buffer management for information retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data, Seattle, WA. ACM Press, 118--129.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Kaszkiel, M., Zobel, J., and Sacks-Davis, R. 1999. Efficient passage ranking for document databases. ACM Trans. Inform. Syst. 17, 4 (Oct.), 406--439.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Kent, A. J., Sacks-Davis, R., and Ramamohanarao, K. 1990. A signature file scheme based on multiple organisations for indexing very large text databases. J. Amer. Soc. Inform. Science 41, 7, 508--534.]]Google ScholarGoogle ScholarCross RefCross Ref
  92. Klein, S. T., Bookstein, A., and Deerwester, S. 1989. Storing text retrieval systems on CD-ROM: Compression and encryption considerations. ACM Trans. Office Inform. Syst. 7, 3, 230--245.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Kleinberg, J. M. 1999. Authoritative sources in a hyper-linked environment. J. ACM 46, 5, 604--632.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. Comput. Surv. 32, 2, 144--173.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Kocberbera, S. and Can, F. 1997. Vertical framing of superimposed signature files using partial evaluation of queries. Inform. Proc. Manag. 33, 3, 353--376.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Lancaster, F. W. and Fayen, E. G. 1973. Information Retrieval OnLine. Melville, Los Angeles, CA.]]Google ScholarGoogle Scholar
  97. Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Tran. Knowl. Data Eng. 7, 3 (June) 423--435.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Lee, D. L. and Leng, C.-W. 1989. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Inform. Syst. 7, 2, 158--180.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Lee, Y. K., Yoo, S. J., Yoon, K., and Berra, P. B. 1996. Index structures for structured documents. In Proceedings of the ACM Digital Libraries, Bethesda, MD, E. A. Fox and G. Marchionini, Eds. ACM Press, 91--99.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the World Wide Web Conference. Budapest, Hungary. ACM Press, 19--28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Lempel, R. and Moran, S. 2004. Optimal result prefetching in web search engines with segmented indices. ACM Trans. Internet Techn. 4, 1, 31--59.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Lempel, R. and Moran, S. 2005. Competitive caching of query results in search engines. Theoret. Comput. Science 324, 253--271.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Lester, N., Moffat, A., Webber, W., and Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of the 6th International Conference on Web Informations Systems. Lecture Notes in Computer Science, vol. 3806, Springer, 470--477.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Lester, N., Moffat, A., and Zobel, J. 2005. Fast on-line index construction by geometric partitioning. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 776--783.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Lester, N., Zobel, J., and Williams, H. E. 2006. Efficient online index maintenance for text retrieval systems. Inform. Proce. Manag. To appear.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Lim, L., Wang, M., Padmanabhan, S., Vitter, J. S., and Agarwal, R. 2003. Dynamic maintenance of web indexes using landmarks. In Proceedings of the World-Wide Web Conference, Budapest, Hungary. Y.-F. R. Chen, L. Kovács, and S. Lawrence, Eds. ACM Press, 102--111.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Linoff, G. and Stanfill, C. 1993. Compression of indexes with full positional information in very large text databases. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburg, PA. R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, 88--97.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Lu, Z. and McKinley, K. S. 2003. Partial collection replication for information retrieval. Kluwer Int. J. Inform. Retrie. 6, 2, 159--198.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Lucarella, D. 1988. A document retrieval system based upon nearest neighbour searching. J. Inform. Science 14, 25--33.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Luhn, H. P. 1957. A statistical approach to mechanised encoding and searching of library information. IBM J. Resear. Develop. 1, 309--317.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. MacFarlane, A., McCann, J. A., and Robertson, S. E. 2000. Parallel search using partitioned inverted files. In Proceedings of the String Processing and Information Retrieval Symposium. A Coruña, Spain. P. de la Fuente, Ed. IEEE Computer Society Press, Los Alamitos, CA. 209--220.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Macleod, I. A., Martin, T. P., Nordin, B., and Phillips, J. R. 1987. Strategies for building distributed information retrieval systems. Inform. Proc. Manag. 23, 6, 511--528.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Manber, U. and Wu, S. 1994. GLIMPSE: a tool to search through entire file systems. In USENIX Winter Technical Conference. San Francisco CA. USENIX Association, Berkeley, CA. 23--32.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Maron, M. E. and Kuhns, J. L. 1960. On relevance, probabilistic indexing and information retrieval. J. ACM 7, 3, 216--244.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Martin, T. P., Macleod, I. A., Russell, J. I., Leese, K., and Foster, B. 1990. A case study of caching strategies for a distributed full text retrieval system. Inform. Proc. Manag. 26, 2, 227--247.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Martin, T. P. and Russell, J. I. 1991. Data caching strategies for distributed full text retrieval systems. Inform. Syst. 16, 1, 1--11.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Matthew, F. W. and Thomson, L. 1967. Weighted term search: a computer program for an inverted coordinate search on magnetic tape. J. Chem. Document. 7, 1, 49--56.]]Google ScholarGoogle ScholarCross RefCross Ref
  119. McDonell, K. J. 1977. An inverted index implementation. Comput. J. 20, 1, 116--123.]]Google ScholarGoogle ScholarCross RefCross Ref
  120. Melnik, S., Raghavan, S., Yang, B., and Garcia-Molina, H. 2001. Building a distributed full-text index for the web. ACM Trans. Inform. Syst. 19, 3, 217--241.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Moffat, A. 1992. Economical inversion of large text files. Comput. Syst. 5, 2, 125--139.]]Google ScholarGoogle Scholar
  122. Moffat, A. and Bell, T. A. H. 1995. In-situ generation of compressed inverted files. J. Ame. Soci. Inform. Science 46, 7 (Aug.) 537--550.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Moffat, A. and Stuiver, L. 2000. Binary interpolative coding for effective index compression. Kluwer Int. J. Inform. Retriev. 3, 1 (July) 25--47.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Moffat, A. and Turpin, A. 2002. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Moffat, A., Webber, W., Zobel, J., and Baeza-Yates, R. 2005. A pipelined architecture for distributed text query evaluation. Submitted for publication.]]Google ScholarGoogle Scholar
  126. Moffat, A. and Zobel, J. 1992a. Coding for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA, 72--81.]]Google ScholarGoogle Scholar
  127. Moffat, A. and Zobel, J. 1992b. Parameterised compression for sparse bitmaps. In Proceedings of the 5th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmaaank, N. J. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, 274--285.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Moffat, A. and Zobel, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Informa. Syst. 14, 4 (Oct.) 349--379.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the 5th International Conference on Web Informations Systems, Brisbane, Australia. X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, 1--12.]]Google ScholarGoogle Scholar
  130. Moffat, A., Zobel, J., and Sacks-Davis, R. 1994. Memory efficient ranking. Inform. Proc. Manag. 30, 6 (Nov.) 733--744.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Motzkin, D. 1994. On high performance of updates within an efficient document retrieval system. Inform. Proc. Manag. 30, 1, 93--118.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Navarro, G., de Moura, E., Neubert, M., Ziviani, N., and Baeza-Yates, R. 2000. Adding compression to block addressing inverted indexes. Kluwer Int. J. Inform. Retriev. 3, 1, 49--77.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Noreault, T., Koll, M., and McGill, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Amer, Soc. Inform. Science 28, 333--339.]]Google ScholarGoogle ScholarCross RefCross Ref
  134. Perry, S. A. and Willett, P. 1983. A review of the use of inverted files for best match searching in information retrieval systems. J. Inform. Science 6, 59--66.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Persin, M., Zobel, J., and Sacks-Davis, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Amer. Soc. Inform. Science 47, 10, 749--764.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Ponte, J. M. and Croft, W. B. 1998. A language modelling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 275--281.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Rabitti, F., Ed. 1986. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, ACM Press.]] Google ScholarGoogle Scholar
  138. Reddaway, S. F. 1991. High speed text retrieval from large databases on a massively parallel processor. Inform. Proc. Manag. 27, 4, 311--316.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Ribeiro-Neto, B. and Barbosa, R. 1998. Query performance for tightly coupled distributed digital libraries. In Proceedings of the ACM Digital Libraries, Pittsburgh, PA, I. Witten, R. Akscyn, and F. M. S. III, Eds. ACM Press, 182--190.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Riberto-Neto, B., de Moura, E. S., Neubert, M. S., and Ziviani, N. 1999. Efficient distributed algorithms to build inverted files. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA, 105--112.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Rice, R. F. 1979. Some practical universal noiseless coding techniques. Tech. Rep. 79--22, Jet Propulsion Laboratory, Pasadena, CA.]]Google ScholarGoogle Scholar
  142. Robertson, S. E. 1977. The probability ranking principle in IR. J. Document. 33, 4 (Dec.) 294--304.]]Google ScholarGoogle ScholarCross RefCross Ref
  143. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. 1994. Okapi at TREC-3. In Overview of the 3rd TREC Text REtrieval Conference, Gaithersburg, MD, D. Harman, Ed. NIST, NIST Special Publication 500-226.]]Google ScholarGoogle Scholar
  144. Rogers, W., Candela, G., and Harman, D. 1995. Space and time improvements for indexing in information retrieval. In Proceedings of the Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, L. Spitz and D. D. Lewis, Eds.]]Google ScholarGoogle Scholar
  145. Sacks-Davis, R., Kent, A. J., and Ramamohanarao, K. 1987. Multi-key access methods based on superimposed coding techniques. ACM Trans. Datab. Syst. 12, 4, 655--696.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Salomon, D. 2000. Data Compression: The Complete Reference, 2nd Ed. Springer, Berlin, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Salton, G. 1962. The use of citations as an aid to automatic content analysis. Tech. Rep. ISR-2, Section III, Harvard Computation Laboratory, Cambridge, MA.]]Google ScholarGoogle Scholar
  148. Salton, G. 1968. Automatic Index Organization and Retrieval. McGraw-Hill, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Salton, G., Ed. 1971. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Salton, G. 1972. Dynamic document processing. Comm. ACM 15, 7, 658--668.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Salton, G. and Buckley, C. 1988a. Parallel text search methods. Comm. ACM 31, 2 (Feb.) 202--215.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Salton, G. and Buckley, C. 1988b. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manag. 24, 5, 513--523.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Salton, G. and McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Salton, G., Wong, A., and Wang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11 (Nov.) 613--620.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Saraiva, P. C., de Moura, E. S., Ziviani, N., Fonseca, R., Meira, W., Murta, C., and Ribeiro-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 51--58.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Sayood, K. 2000. Introduction to Data Compression 2nd Ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Scholer, F., Williams, H. E., Yiannis, J., and Zobel, J. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 222--229.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Schuegraf, E. J. 1976. Compression of large inverted files with hyperbolic term distribution. Inform. Proc. Manag. 12, 377--384.]]Google ScholarGoogle ScholarCross RefCross Ref
  160. Segesta, J. and Reid-Green, K. 2002. Harley Tillitt and computerized library searching. IEEE Ann. History Comput. 24, 3 (Sept.) 23--34.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Severance, D. G. and Carlis, J. V. 1977. A practical approach to selecting record access paths. Comput. Surv. 9, 4, 259--272.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Shieh, W.-Y., Chen, T.-F., and Chung, C.-P. 2003. A tree-based inverted file for fast ranked-document retrieval. In Proceedings of the International Conference on Information and Knowledge Engineering. Las Vegas, NV. H. R. Arabnia, Ed. CSREA Press, 64--69.]]Google ScholarGoogle Scholar
  163. Shieh, W.-Y., Chen, T.-F., Shann, J. J.-J., and Chung, C.-P. 2003. Inverted file compression through document identifier reassignment. Inform. Proc. Manag. 39, 1, 117--131.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Shieh, W.-Y. and Chung, C.-P. 2005. A statistics-based approach to incrementally update inverted files. Inform. Proc. Manag. 41, 2, 275--288.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Shieh, W.-Y., Shann, J. J.-J., and Chung, C.-P. 2003. An inverted file cache for fast information retrieval. J. Inform. Science Eng. 19, 4, 681--695.]]Google ScholarGoogle Scholar
  166. Shoens, K., Tomasic, A., and García-Molina, H. 1994. Synthetic workload performance analysis of incremental updates. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, W. B. Croft and C. J. van Rijsbergen, Eds. ACM Press, 329--338.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. Silvestri, F., Orlando, S., and Perego, R. 2004. Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, England, M. Sanderson, K. Järvelin, J. Allan, and P. Bruza, Eds. ACM Press, 305--312.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 21--29.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Smeaton, A. and van Rijsbergen, C. J. 1981. The nearest neighbour problem in information retrieval. In Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Oakland, CA, C. J. Crouch, Ed. ACM Press, 83--87.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. parts 1&2. Inform. Proc. Manag. 36, 6, 779--840.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Sparck Jones, K. and Willett, P., Eds. 1997. Readings in Information Retrieval. Academic Press/Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Spink, A., Wolfram, D., Jansen, B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soci. Inform. Science 52, 3, 226--234.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. Spink, A. and Xu, J. L. 2000. Selected results from a large study of web searching: The Excite study. Inform. Resear.---Int. Electron. J. 6, 1.]]Google ScholarGoogle Scholar
  174. Stanfill, C. 1990. Partitioned posting files: a parallel inverted file structure for information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. 413--428.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Stanfill, C., Thau, R., and Waltz, D. 1989. A parallel indexed algorithm for information retrieval. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Cambridge, MA, N. J. Belkin and C. J. van Rijsbergen, Eds. ACM Press, 88--97.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Stellhorn, W. H. 1977. An inverted file processor for information retrieval. IEEE Trans. Comput. 26, 12, 1258--1267.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Strohman, T., Turtle, H., and Croft, W. B. 2005. Optimization strategies for complex queries. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil, G. Marchionini, A. Moffat, J. Tate, R. Baeza-Yates, and N. Ziviani, Eds. ACM Press, 219--225.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Tague, J., Ed. 1985. Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Teuhola, J. 1978. A compression method for clustered bit-vectors. Inform. Proc. Lett. 7, 6 (Oct.) 308--311.]]Google ScholarGoogle ScholarCross RefCross Ref
  180. Tomasic, A. and García-Molina, H. 1993. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the International Conference on Parallel and Distributed Information Systems. San Diego, CA, M. J. Carey and P. Valduriez, Eds. IEEE Computer Society Press, 8--17.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. Tomasic, A. and García-Molina, H. 1996. Performance issues in distributed shared-nothing information-retrieval systems. Inform. Proc. Manag. 32, 6, 647--665.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. Tomasic, A., García-Molina, H., and Shoens, K. 1994. Incremental updates of inverted lists for text document retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data. Minneapolis, MA, R. T. Snodgrass and M. Winslett, Eds. ACM Press, 289--300.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Trotman, A. 2003. Compressing inverted files. Kluwer Int. J. Inform. Retriev. 6, 5--19.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. Turtle, H. and Flood, J. 1995. Query evaluation: strategies and optimizations. Inform. Proc. Manag. 31, 1 (Nov.), 831--850.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. van Rijsbergen, C. J. 1979. Informat. Retriev., 2nd Ed. Butterworths, London, UK.]]Google ScholarGoogle Scholar
  186. Vasanthakumar, S. R., Callan, J. P., and Croft, W. B. 1996. Integrating INQUERY with an RDBMS to support text retrieval. Bull. Techn. Comm. Data Eng. 19, 1, 24--33.]]Google ScholarGoogle Scholar
  187. Vidick, J. L., Ed. 1990. Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. ACM Press,]] Google ScholarGoogle Scholar
  188. Voorhees, E. M. 1986. The efficiency of inverted index and cluster searches. In Proceedings of the 9th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy. 164--174.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiment and evaluation in information retrieval. MIT Press, Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.]]Google ScholarGoogle ScholarCross RefCross Ref
  191. Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference. Auckland, New Zealand. M. Orlowska, Ed. Australian Computer Society, 141--152.]]Google ScholarGoogle Scholar
  192. Williams, H. E., Zobel, J., and Bahle, D. 2004. Fast phrase querying with combined indexes. ACM Trans. Inform. Syst. 22, 4, 573--594.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Witten, I. H., Bell, T. C., and Nevill, C. G. 1991. Models for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference. Snowbird, UT, J. A. Storer and J. H. Reif, Eds. IEEE Computer Society Press, Los Alamitos, CA. 23--32.]]Google ScholarGoogle Scholar
  194. Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd Ed. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. Wong, W. Y. P. and Lee, D. K. 1993. Implementations of partial document ranking using inverted files. Inform. Proc. Manag. 29, 5 (Sept.), 647--669.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  196. Xi, W., Sornil, O., and Fox, E. A. 2002a. Hybrid partition inverted files for large-scale digital libraries. In Proceedings of the Digital Library: IT Opportunities and Challenges in the New Millennium. Beijing Library Press, Beijing, China, 404--418.]]Google ScholarGoogle Scholar
  197. Xi, W., Sornil, O., Luo, M., and Fox, E. A. 2002b. Hybrid partition inverted files: Experimental validation. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. Rome, Italy, M. Agosti and C. Thanos, Eds. Lecture Notes in Computer Science, vol. 2458, Springer, 422--413.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  198. Zezula, P., Rabitti, F., and Tiberio, P. 1991. Dynamic partitioning of signature files. ACM Tran. Inform. Syst. 9, 4 (Oct.), 336--369.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  199. Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1, 18--34.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. Zobel, J., Moffat, A., and Ramamohanarao, K. 1996. Guidelines for presentation and comparison of indexing techniques. SIGMOD Record 25, 3 (Oct.), 10--15.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  201. Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Datab. Syst. 23, 4 (Dec.), 453--490.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. Zobel, J., Moffat, A., and Sacks-Davis, R. 1992. An efficient indexing technique for full-text database systems. In Proc. VLDB Int. Conf. on Very Large Databases, L.-Y. Yuan, Ed. Morgan Kaufmann, Vancouver, 352--362.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  203. Zobel, J., Moffat, A., and Sacks-Davis, R. 1993a. Searching large lexicons for partially specified terms using compressed inverted files. In Proceedings of the International Conference on Very Large Databases. Dublin, Ireland, R. Agrawal, S. Baker, and D. Bell, Eds. Morgan Kaufmann, 290--301.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  204. Zobel, J., Moffat, A., and Sacks-Davis, R. 1993b. Storage management for files of dynamic records. In Proceedings of the Australasian Database Conference. Brisbane, Australia, 26--38.]]Google ScholarGoogle Scholar
  205. Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. 1995. Efficient retrieval of partial documents. Inform. Proc. Manag. 31, 3, 361--377.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inverted files for text search engines

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        • Published in

                          cover image ACM Computing Surveys
                          ACM Computing Surveys  Volume 38, Issue 2
                          2006
                          145 pages
                          ISSN:0360-0300
                          EISSN:1557-7341
                          DOI:10.1145/1132956
                          Issue’s Table of Contents

                          Copyright © 2006 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 25 July 2006
                          Published in csur Volume 38, Issue 2

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Qualifiers

                          • article

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader