skip to main content

Inverted files for text search engines

Published: 25 July 2006 Publication History


The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.


Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 35--42.]]
Anh, V. N. and Moffat, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia. 291--298.]]
Anh, V. N. and Moffat, A. 2002. Impact transformation: Effective and efficient web retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 3--10.]]
Anh, V. N. and Moffat, A. 2005. Inverted index compression using word-aligned binary codes. Kluwer International Journal of Information Retrieval 8, 1, 151--166.]]
Arusu, A., Cho, J., Garcia-Molina, H., Paepcke, A., and Raghavan, S. 2001. Searching the Web. ACM Trans. Internet Technol. 1, 1, 2--43.]]
Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., and Ziviani, N. 2001. Distributed query processing using partitioned inverted files. In Proceedings of String Processing and Information Retrieval Symposium, Laguna de San Rafael, Chile. G. Navarro, Ed. IEEE Computer Society, 10--20.]]
Baeza-Yates, R., Moffat, A., and Navarro, G. 2002. Searching large text collections. In Handbook of Massive Data Sets, J. Abello, P. Pardalos, and M. Resende, Eds. Kluwer Academic Publishers, Boston, MA. 195--244.]]
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press, New York, NY.]]
Baeza-Yates, R. A. and Navarro, G. 2000. Block addressing indices for approximate text retrieval. J. Amer. Soc. Inform. Science 51, 1, 69--82.]]
Barbará, D., Mehrotra, S., and Vallabhaneni, P. 1996. The Gold text indexing engine. In Proceedings of IEEE International Conference on Data Engineering, New Orleans, LA. S. Y. W. Su, Ed. IEEE Computer Society Press, Los Alamitos, CA. 172--179.]]
Barroso, L. A., Dean, J., and Hölzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 23, 2 (April), 22--28.]]
Bayer, R. and McCreight, R. 1972. Organization and maintenance of large ordered indexes. Acta Informatica 1, 173--189.]]
Beaulieu, M., Baeza-Yates, R., Myaeng, S. H., and Järvelin, K., Eds. 2002. Proceedings of the 25th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval. Tampere, Finland, ACM Press.]]
Bell, T. C., Cleary, J. G., and Witten, I. H. 1990. Text Compression. Prentice-Hall, Englewood Cliffs, NJ.]]
Bell, T. C., Moffat, A., Nevill-Manning, C. G., Witten, I. H., and Zobel, J. 1993. Data compression in full-text retrieval systems. J. Amer. Soc. Inform. Science 44, 9 (Oct.), 508--531.]]
Bertino, E., Ooi, B. C., Sacks-Davis, R., Tan, K.-L., Zobel, J., Shidlovsky, B., and Catania, B. 1997. Indexing Techniques for Advanced Database Systems. Kluwer Academic Publishers, Boston, MA.]]
Bird, R. M., Newsbaum, J. B., and Trefftzs, J. L. 1978. Text file inversion: An evaluation. In Proceedings of the 4th Workshop on Computer Architecture for Non-Numeric Processing. Blue Mountain Lake, NY, ACM Press, 42--50.]]
Bird, R. M., Tu, J. C., and Worthy, R. M. 1977. Associative/parallel processors for searching very large textual data bases. In Proceedings of the 3rd Non-Numeric Workshop. Syracuse, NY, ACM Press, 8--16.]]
Blandford, D. and Blelloch, G. 2002. Index compression through document reordering. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 342--351.]]
Bookstein, A. and Klein, S. T. 1990. Using bitmaps for medium sized information retrieval systems. Inform. Proces. Manag. 26, 525--533.]]
Bookstein, A. and Klein, S. T. 1991a. Compression of correlated bit-vectors. Inform. Syst. 16, 4, 387--400.]]
Bookstein, A. and Klein, S. T. 1991b. Flexible compression for bitmap sets. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 402--410.]]
Bookstein, A. and Klein, S. T. 1991c. Generative models for bitmap sets with compression applications. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Chicago, IL. A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, Eds. ACM Press, 63--71.]]
Bookstein, A. and Klein, S. T. 1992. Models of bitmap generation: A systematic approach to bitmap compression. Inform. Proc. Manag. 28, 6, 735--748.]]
Bookstein, A., Klein, S. T., and Raita, T. 2000. Simple Bayesian model for bitmap compression. Kluwer Int. J. Inform. Retriev. 1, 4, 315--328.]]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.]]
Brisaboa, N. R., Fariña, A., Navarro, G., and Esteller, M. F. 2003. (S,C)-dense coding: An optimized compression code for natural language text databases. In Proceedings of String Processing and Information Retrieval Symposium, Manaus, Brazil, M. A. Nascimento, Ed. Springer, 122--136.]]
Brown, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, 30--38.]]
Brown, E. W., Callan, J. P., and Croft, W. B. 1994. Fast incremental indexing for full-text information retrieval. In Proceedings of the International Conference on Very Large Databases, Santiago, Chile, J. B. Bocca, M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 192--202.]]
Brown, E. W., Callan, J. P., Croft, W. B., Eliot, J., and Moss, B. 1994. Supporting full-text information retrieval with a persistent object store. In Proceedings of the International Conference on Advances in Database Technology (EDBT), Cambridge, UK, M. Jarke, J. A. B. Jr., and K. G. Jeffery, Eds. Springer, 365--378.]]
Buckley, C. and Lewit, A. F. 1985. Optimisation of inverted vector searches. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, 97--110.]]
Burkowski, F. J. 1990. Surrogate subsets: a free space management strategy for the index of a text retrieval system. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium, 211--226.]]
Büttcher, S. and Clarke, C. L. A. 2005. Indexing time vs. query time trade-offs in dynamic information retrieval systems. In Proceedings of the International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 317--318.]]
Cacheda, F., Plachouras, V., and Ounis, I. 2004. Performance analysis of distributed architectures to index one terabyte of text. In Proceedings of the European Conference on IR Research, Sunderland, UK, S. McDonald and J. Tait, Eds. 395--408. Lecture Notes in Computer Science, Springer, vol. 2997.]]
Cahoon, B. and McKinley, K. S. 1996. Performance evaluation of a distributed architecture for information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 110--118.]]
Cahoon, B., McKinley, K. S., and Lu, Z. 2000. Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Trans. Inform. Syst. 18, 1 (Jan.) 1--43.]]
Can, F. 1994. On the efficiency of best-match cluster searches. Inform. Proc. Manag. 30, 3, 343--361.]]
Cardenas, A. 1975. Analysis and performance of inverted data base structures. Comm. ACM 18, 5, 253--263.]]
Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International Conference on Research and Development in Information Retrieval. New Orleans, LA. 43--50.]]
Choueka, Y., Fraenkel, A., Klein, S., and Segal, E. 1987. Improved techniques for processing queries in full-text systems. In Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, C. T. Yu and C. J. V. Rijsbergen, Eds. ACM Press, 306--315.]]
Choueka, Y., Fraenkel, A. S., and Klein, S. T. 1988. Compression of concordances in full-text retrieval systems. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Grenoble, France, Y. Chiaramella, Ed. ACM Press, 597--612.]]
Choueka, Y., Fraenkel, A. S., Klein, S. T., and Segal, E. 1986. Improved hierarchical bit-vector compression in document retrieval systems. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, 88--97.]]
Ciaccia, P., Tiberio, P., and Zezula, P. 1996. Declustering of key-based partitioned signature files. ACM Trans. Datab. Syst. 21, 3 (Sept.), 295--338.]]
Ciaccia, P. and Zezula, P. 1993. Estimating accesses in partitioned signature file organizations. ACM Trans. Inform. Syst. 11, 2, 133--142.]]
Clarke, C. L. A. and Cormack, G. V. 1995. Dynamic inverted indexes for a distributed full-text retrieval system. Tech. rep. MT-95-01, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]
Clarke, C. L. A. and Cormack, G. V. 2000. Shortest-substring retrieval and ranking. ACM Trans. Inform. Syst. 18, 1, 44--78.]]
Clarke, C. L. A., Cormack, G. V., and Burkowski, F. J. 1994. Fast inverted indexes with on-line update. Tech. rep. CS-94-40, Department of Computer Science, University of Waterloo, Waterloo, Canada.]]
Clarke, C. L. A., Cormack, G. V., and Tudhope, E. A. 2000. Relevance ranking for one to three term queries. Inform. Proc. Manage. 36, 2, 291--311.]]
Couvreur, T. R., Benzel, R. N., Miller, S. F., Zeitler, D. N., Lee, D. L., Singhal, M., Shivaratri, N., and Wong, W. Y. P. 1994. An analysis of performance and cost factors in searching large text databases using parallel search systems. J. Amer. Soc. Inform. Science 45, 7, 443--464.]]
Cringean, J. K., England, R., Manson, G. A., and Willett, P. 1990. Parallel text searching in serial files using a processor farm. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Brussels, Belgium, 429--453.]]
Croft, W. B., Harper, D. J., Kraft, D. H., and Zobel, J., Eds. 2001. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. ACM Press.]]
Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht, The Netherlands.]]
Croft, W. B., Moffat, A., van Rijsbergen, C. J., Wilkinson, R., and Zobel, J., Eds. 1998. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. ACM Press.]]
Croft, W. B. and Savino, P. 1988. Implementing ranking strategies using text signatures. ACM Trans. Office Inform. Syst. 6, 1, 42--62.]]
Culpepper, J. S. and Moffat, A. 2005. Enhanced byte codes with restricted prefix properties. In Proceedings of the 12th International Symposium on String Processing and Information Retrieval. Buenos Aires, Argentina, M. P. Consens and G. Navarro, Eds. Lecture Notes in Computer Science, vol. 3772, Springer, 1--12.]]
Cutting, D. and Pedersen, J. 1990. Optimisations for dynamic inverted index maintenance. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. Brussels, Belgium, J.-L. Vidick, Ed. ACM Press, 405--411.]]
de Kretser, O. and Moffat, A. 1999. Effective document presentation with a locality-based similarity heuristic. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA. 113--120.]]
de Kretser, O. and Moffat, A. 2004. Seft: A search engine for text. Softw.---Prac. Exper. 34, 10 (Aug.), 1011--1023.]]
de Kretser, O., Moffat, A., Shimmin, T., and Zobel, J. 1998. Methodologies for distributed information retrieval. In Proceedings of the IEEE International Conference on Distributed Computing Systems. Amsterdam, The Netherlands. M. P. Papazoglou, M. Takizawa, B. Krämer, and S. Chanson, Eds. IEEE Computer Society Press, Los Alamitos, CA. 66--73.]]
Eastman, C. 1983. Current practice in the evaluation of multikey search algorithms. In Proceedings of the 6th International ACM SIGIR Conference on Research and Development in Information Retrieval. Washington DC. J. J. Kuehn, Ed. ACM Press. 197--204.]]
Edmundson, H. P. and Wyllys, R. E. 1961. Automatic abstracting and indexing---survey and recommendations. Comm. ACM 4, 5 (May), 226--234.]]
Elias, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inform. Theory IT-21, 2 (March), 194--203.]]
Faloutsos, C. 1985a. Access methods for text. Comput. Surv. 17, 1, 49--74.]]
Faloutsos, C. 1985b. Signature files: Design and performance comparison of some signature extraction methods. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Montreal, Canada, 63--82.]]
Faloutsos, C. and Jagadish, H. V. 1992. Hybrid index organizations for text databases. In Proceedings of the International Conference on Extending Database Technology, Vienna, Austria, A. Pirotte, C. Delobel, and G. Gottlob, Eds. Lecture Notes in Computer Science, vol. 580, Springer, 310--327.]]
Faloutsos, C. and Oard, D. W. 1995. A survey of information retrieval and filtering methods. Tech. rep., University of Maryland Institute for Advanced Computer Studies Report, University of Maryland at College Park, MD.]]
Fox, E. A. and Lee, W. C. 1991. FAST-INV: A fast algorithm for building large inverted files. Tech. rep. TR 91--10, Virginia Polytechnic Institute and State University, Blacksburg, VA.]]
Fraenkel, A. S. and Klein, S. T. 1985. Novel compression of sparse bit-strings---Preliminary report. In Combinatorial Algorithms on Words, Volume 12, A. Apostolico and Z. Galil, Eds. NATO ASI Series F. Springer, Berlin, Germany, 169--183.]]
Frakes, W. B. and Baeza-Yates, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ.]]
Frei, H.-P., Harman, D., Schäuble, P., and Wilkinson, R., Eds. 1996. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland. ACM Press.]]
Gallager, R. G. and Van Voorhis, D. C. 1975. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inform. Theory IT--21, 2 (March), 228--230.]]
Garcia, S., Williams, H. E., and Cannane, A. 2004. Access-ordered indexes. In Proceedings of the Australasian Computer Science Conference, Dunedin, New Zealand. V. Estivill-Castro, Ed. Australian Computer Society, 7--14.]]
Golomb, S. W. 1966. Run-length encodings. IEEE Trans. Inform. Theory IT--12, 3 (July), 399--401.]]
Grabs, T., Böhm, K., and Schek, H.-J. 2001. PowerDB-IR: Information retrieval on top of a database cluster. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Atlanta, GA. H. Paques, L. Liu, and D. Grossman, Eds. ACM Press, 411--418.]]
Grossman, D. A. and Frieder, O. 2004. Information Retrieval: Algorithms and Heuristics, 2nd Ed. Kluwer Academic Publishers, Dordrecht, The Netherlands.]]
Harman, D., McCoy, W., Toense, R., and Candela, G. 1991. Prototyping a distributed information retrieval system using statistical ranking. Inform. Proc. Manag. 27, 5, 449--460.]]
Harman, D. K. 1992. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms. W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, Englewood Cliffs, NJ. 362--392.]]
Harman, D. K. and Candela, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Amer. Soc. Inform. Science 41, 8 (Aug.), 581--589.]]
Harper, D. J. 1982. An evaluation of four information storage and retrieval packages. Tech. rep. 7, CSIRO Division of Computing Research, Canberra, Australia.]]
Haskin, R. L. 1980. Hardware for searching very large text databases. In Proceedings of the Workshop on Computer Architecture for Non-Numeric Processing, Pacific Grove, CA, ACM Press, 49--56.]]
Hawking, D. 1997. Scalable text retrieval for large digital libraries. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy. C. Thanos, Ed. Lecture Notes in Computer Science, vol. 1324. Springer, 127--145.]]
Hawking, D. 1998. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum 32, 2, 16--22.]]
Heaps, H. S. 1978. Information Retrieval, Computational and Theoretical Aspects. Academic Press, New York, NY.]]
Hearst, M., Gey, F., and Tong, R., Eds. 1999. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco. CA, ACM Press.]]
Heinz, S. and Zobel, J. 2003. Efficient single-pass index construction for text databases. J. Amer. Soc. Inform. Science Techn. 54, 8, 713--729.]]
Ivie, E. L. 1966. Search procedure based on measures of relatedness between documents. Ph.D. thesis. MIT, Cambridge, MA.]]
Jakobsson, M. 1978. Huffman coding in bit-vector compression. Inform. Pro. Let. 7, 6 (Oct.) 304--307.]]
Jeong, B. S. and Omiecinski, E. 1995. Inverted file partitioning schemes in multiple disk systems. IEEE Tran. Parall. Distrib. Syst. 6, 2, 142--153.]]
Jónsson, B. T., Franklin, M. J., and Srivastava, D. 1998. Interaction of query evaluation and buffer management for information retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data, Seattle, WA. ACM Press, 118--129.]]
Kaszkiel, M., Zobel, J., and Sacks-Davis, R. 1999. Efficient passage ranking for document databases. ACM Trans. Inform. Syst. 17, 4 (Oct.), 406--439.]]
Kent, A. J., Sacks-Davis, R., and Ramamohanarao, K. 1990. A signature file scheme based on multiple organisations for indexing very large text databases. J. Amer. Soc. Inform. Science 41, 7, 508--534.]]
Klein, S. T., Bookstein, A., and Deerwester, S. 1989. Storing text retrieval systems on CD-ROM: Compression and encryption considerations. ACM Trans. Office Inform. Syst. 7, 3, 230--245.]]
Kleinberg, J. M. 1999. Authoritative sources in a hyper-linked environment. J. ACM 46, 5, 604--632.]]
Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. Comput. Surv. 32, 2, 144--173.]]
Kocberbera, S. and Can, F. 1997. Vertical framing of superimposed signature files using partial evaluation of queries. Inform. Proc. Manag. 33, 3, 353--376.]]
Lancaster, F. W. and Fayen, E. G. 1973. Information Retrieval OnLine. Melville, Los Angeles, CA.]]
Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Tran. Knowl. Data Eng. 7, 3 (June) 423--435.]]
Lee, D. L. and Leng, C.-W. 1989. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Inform. Syst. 7, 2, 158--180.]]
Lee, Y. K., Yoo, S. J., Yoon, K., and Berra, P. B. 1996. Index structures for structured documents. In Proceedings of the ACM Digital Libraries, Bethesda, MD, E. A. Fox and G. Marchionini, Eds. ACM Press, 91--99.]]
Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the World Wide Web Conference. Budapest, Hungary. ACM Press, 19--28.]]
Lempel, R. and Moran, S. 2004. Optimal result prefetching in web search engines with segmented indices. ACM Trans. Internet Techn. 4, 1, 31--59.]]
Lempel, R. and Moran, S. 2005. Competitive caching of query results in search engines. Theoret. Comput. Science 324, 253--271.]]
Lester, N., Moffat, A., Webber, W., and Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of the 6th International Conference on Web Informations Systems. Lecture Notes in Computer Science, vol. 3806, Springer, 470--477.]]
Lester, N., Moffat, A., and Zobel, J. 2005. Fast on-line index construction by geometric partitioning. In Proceedings of the CIKM International Conference on Information and Knowledge Management, Bremen, Germany, A. Chowdhury, N. Fuhr, M. Ronthaler, H.-J. Schek, and W. Teiken, Eds. ACM Press, 776--783.]]
Lester, N., Zobel, J., and Williams, H. E. 2006. Efficient online index maintenance for text retrieval systems. Inform. Proce. Manag. To appear.]]
Lim, L., Wang, M., Padmanabhan, S., Vitter, J. S., and Agarwal, R. 2003. Dynamic maintenance of web indexes using landmarks. In Proceedings of the World-Wide Web Conference, Budapest, Hungary. Y.-F. R. Chen, L. Kovács, and S. Lawrence, Eds. ACM Press, 102--111.]]
Linoff, G. and Stanfill, C. 1993. Compression of indexes with full positional information in very large text databases. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburg, PA. R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, 88--97.]]
Lu, Z. and McKinley, K. S. 2003. Partial collection replication for information retrieval. Kluwer Int. J. Inform. Retrie. 6, 2, 159--198.]]
Lucarella, D. 1988. A document retrieval system based upon nearest neighbour searching. J. Inform. Science 14, 25--33.]]
Luhn, H. P. 1957. A statistical approach to mechanised encoding and searching of library information. IBM J. Resear. Develop. 1, 309--317.]]
MacFarlane, A., McCann, J. A., and Robertson, S. E. 2000. Parallel search using partitioned inverted files. In Proceedings of the String Processing and Information Retrieval Symposium. A Coruña, Spain. P. de la Fuente, Ed. IEEE Computer Society Press, Los Alamitos, CA. 209--220.]]
Macleod, I. A., Martin, T. P., Nordin, B., and Phillips, J. R. 1987. Strategies for building distributed information retrieval systems. Inform. Proc. Manag. 23, 6, 511--528.]]
Manber, U. and Wu, S. 1994. GLIMPSE: a tool to search through entire file systems. In USENIX Winter Technical Conference. San Francisco CA. USENIX Association, Berkeley, CA. 23--32.]]
Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press Cambridge, MA.]]
Maron, M. E. and Kuhns, J. L. 1960. On relevance, probabilistic indexing and information retrieval. J. ACM 7, 3, 216--244.]]
Martin, T. P., Macleod, I. A., Russell, J. I., Leese, K., and Foster, B. 1990. A case study of caching strategies for a distributed full text retrieval system. Inform. Proc. Manag. 26, 2, 227--247.]]
Martin, T. P. and Russell, J. I. 1991. Data caching strategies for distributed full text retrieval systems. Inform. Syst. 16, 1, 1--11.]]
Matthew, F. W. and Thomson, L. 1967. Weighted term search: a computer program for an inverted coordinate search on magnetic tape. J. Chem. Document. 7, 1, 49--56.]]
McDonell, K. J. 1977. An inverted index implementation. Comput. J. 20, 1, 116--123.]]
Melnik, S., Raghavan, S., Yang, B., and Garcia-Molina, H. 2001. Building a distributed full-text index for the web. ACM Trans. Inform. Syst. 19, 3, 217--241.]]
Moffat, A. 1992. Economical inversion of large text files. Comput. Syst. 5, 2, 125--139.]]
Moffat, A. and Bell, T. A. H. 1995. In-situ generation of compressed inverted files. J. Ame. Soci. Inform. Science 46, 7 (Aug.) 537--550.]]
Moffat, A. and Stuiver, L. 2000. Binary interpolative coding for effective index compression. Kluwer Int. J. Inform. Retriev. 3, 1 (July) 25--47.]]
Moffat, A. and Turpin, A. 2002. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA.]]
Moffat, A., Webber, W., Zobel, J., and Baeza-Yates, R. 2005. A pipelined architecture for distributed text query evaluation. Submitted for publication.]]
Moffat, A. and Zobel, J. 1992a. Coding for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA, 72--81.]]
Moffat, A. and Zobel, J. 1992b. Parameterised compression for sparse bitmaps. In Proceedings of the 5th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmaaank, N. J. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, 274--285.]]
Moffat, A. and Zobel, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Informa. Syst. 14, 4 (Oct.) 349--379.]]
Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the 5th International Conference on Web Informations Systems, Brisbane, Australia. X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, 1--12.]]
Moffat, A., Zobel, J., and Sacks-Davis, R. 1994. Memory efficient ranking. Inform. Proc. Manag. 30, 6 (Nov.) 733--744.]]
Motzkin, D. 1994. On high performance of updates within an efficient document retrieval system. Inform. Proc. Manag. 30, 1, 93--118.]]
Navarro, G., de Moura, E., Neubert, M., Ziviani, N., and Baeza-Yates, R. 2000. Adding compression to block addressing inverted indexes. Kluwer Int. J. Inform. Retriev. 3, 1, 49--77.]]
Noreault, T., Koll, M., and McGill, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Amer, Soc. Inform. Science 28, 333--339.]]
Perry, S. A. and Willett, P. 1983. A review of the use of inverted files for best match searching in information retrieval systems. J. Inform. Science 6, 59--66.]]
Persin, M., Zobel, J., and Sacks-Davis, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Amer. Soc. Inform. Science 47, 10, 749--764.]]
Ponte, J. M. and Croft, W. B. 1998. A language modelling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 275--281.]]
Rabitti, F., Ed. 1986. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy, ACM Press.]]
Reddaway, S. F. 1991. High speed text retrieval from large databases on a massively parallel processor. Inform. Proc. Manag. 27, 4, 311--316.]]
Ribeiro-Neto, B. and Barbosa, R. 1998. Query performance for tightly coupled distributed digital libraries. In Proceedings of the ACM Digital Libraries, Pittsburgh, PA, I. Witten, R. Akscyn, and F. M. S. III, Eds. ACM Press, 182--190.]]
Riberto-Neto, B., de Moura, E. S., Neubert, M. S., and Ziviani, N. 1999. Efficient distributed algorithms to build inverted files. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. San Francisco, CA, 105--112.]]
Rice, R. F. 1979. Some practical universal noiseless coding techniques. Tech. Rep. 79--22, Jet Propulsion Laboratory, Pasadena, CA.]]
Robertson, S. E. 1977. The probability ranking principle in IR. J. Document. 33, 4 (Dec.) 294--304.]]
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. 1994. Okapi at TREC-3. In Overview of the 3rd TREC Text REtrieval Conference, Gaithersburg, MD, D. Harman, Ed. NIST, NIST Special Publication 500-226.]]
Rogers, W., Candela, G., and Harman, D. 1995. Space and time improvements for indexing in information retrieval. In Proceedings of the Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, L. Spitz and D. D. Lewis, Eds.]]
Sacks-Davis, R., Kent, A. J., and Ramamohanarao, K. 1987. Multi-key access methods based on superimposed coding techniques. ACM Trans. Datab. Syst. 12, 4, 655--696.]]
Salomon, D. 2000. Data Compression: The Complete Reference, 2nd Ed. Springer, Berlin, Germany.]]
Salton, G. 1962. The use of citations as an aid to automatic content analysis. Tech. Rep. ISR-2, Section III, Harvard Computation Laboratory, Cambridge, MA.]]
Salton, G. 1968. Automatic Index Organization and Retrieval. McGraw-Hill, New York, NY.]]
Salton, G., Ed. 1971. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ.]]
Salton, G. 1972. Dynamic document processing. Comm. ACM 15, 7, 658--668.]]
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.]]
Salton, G. and Buckley, C. 1988a. Parallel text search methods. Comm. ACM 31, 2 (Feb.) 202--215.]]
Salton, G. and Buckley, C. 1988b. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manag. 24, 5, 513--523.]]
Salton, G. and McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY.]]
Salton, G., Wong, A., and Wang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11 (Nov.) 613--620.]]
Saraiva, P. C., de Moura, E. S., Ziviani, N., Fonseca, R., Meira, W., Murta, C., and Ribeiro-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. 51--58.]]
Sayood, K. 2000. Introduction to Data Compression 2nd Ed. Morgan Kaufmann, San Francisco, CA.]]
Scholer, F., Williams, H. E., Yiannis, J., and Zobel, J. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland. 222--229.]]
Schuegraf, E. J. 1976. Compression of large inverted files with hyperbolic term distribution. Inform. Proc. Manag. 12, 377--384.]]
Segesta, J. and Reid-Green, K. 2002. Harley Tillitt and computerized library searching. IEEE Ann. History Comput. 24, 3 (Sept.) 23--34.]]
Severance, D. G. and Carlis, J. V. 1977. A practical approach to selecting record access paths. Comput. Surv. 9, 4, 259--272.]]
Shieh, W.-Y., Chen, T.-F., and Chung, C.-P. 2003. A tree-based inverted file for fast ranked-document retrieval. In Proceedings of the International Conference on Information and Knowledge Engineering. Las Vegas, NV. H. R. Arabnia, Ed. CSREA Press, 64--69.]]
Shieh, W.-Y., Chen, T.-F., Shann, J. J.-J., and Chung, C.-P. 2003. Inverted file compression through document identifier reassignment. Inform. Proc. Manag. 39, 1, 117--131.]]
Shieh, W.-Y. and Chung, C.-P. 2005. A statistics-based approach to incrementally update inverted files. Inform. Proc. Manag. 41, 2, 275--288.]]
Shieh, W.-Y., Shann, J. J.-J., and Chung, C.-P. 2003. An inverted file cache for fast information retrieval. J. Inform. Science Eng. 19, 4, 681--695.]]
Shoens, K., Tomasic, A., and García-Molina, H. 1994. Synthetic workload performance analysis of incremental updates. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, W. B. Croft and C. J. van Rijsbergen, Eds. ACM Press, 329--338.]]
Silvestri, F., Orlando, S., and Perego, R. 2004. Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, England, M. Sanderson, K. Järvelin, J. Allan, and P. Bruza, Eds. ACM Press, 305--312.]]
Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich, Switzerland, 21--29.]]
Smeaton, A. and van Rijsbergen, C. J. 1981. The nearest neighbour problem in information retrieval. In Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Oakland, CA, C. J. Crouch, Ed. ACM Press, 83--87.]]
Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. parts 1&2. Inform. Proc. Manag. 36, 6, 779--840.]]
Sparck Jones, K. and Willett, P., Eds. 1997. Readings in Information Retrieval. Academic Press/Morgan Kaufmann, San Francisco, CA.]]
Spink, A., Wolfram, D., Jansen, B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soci. Inform. Science 52, 3, 226--234.]]
Spink, A. and Xu, J. L. 2000. Selected results from a large study of web searching: The Excite study. Inform. Resear.---Int. Electron. J. 6, 1.]]
Stanfill, C. 1990. Partitioned posting files: a parallel inverted file structure for information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. 413--428.]]
Stanfill, C., Thau, R., and Waltz, D. 1989. A parallel indexed algorithm for information retrieval. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Cambridge, MA, N. J. Belkin and C. J. van Rijsbergen, Eds. ACM Press, 88--97.]]
Stellhorn, W. H. 1977. An inverted file processor for information retrieval. IEEE Trans. Comput. 26, 12, 1258--1267.]]
Strohman, T., Turtle, H., and Croft, W. B. 2005. Optimization strategies for complex queries. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil, G. Marchionini, A. Moffat, J. Tate, R. Baeza-Yates, and N. Ziviani, Eds. ACM Press, 219--225.]]
Tague, J., Ed. 1985. Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Montreal, Canada, ACM Press.]]
Teuhola, J. 1978. A compression method for clustered bit-vectors. Inform. Proc. Lett. 7, 6 (Oct.) 308--311.]]
Tomasic, A. and García-Molina, H. 1993. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the International Conference on Parallel and Distributed Information Systems. San Diego, CA, M. J. Carey and P. Valduriez, Eds. IEEE Computer Society Press, 8--17.]]
Tomasic, A. and García-Molina, H. 1996. Performance issues in distributed shared-nothing information-retrieval systems. Inform. Proc. Manag. 32, 6, 647--665.]]
Tomasic, A., García-Molina, H., and Shoens, K. 1994. Incremental updates of inverted lists for text document retrieval. In Proceedings of the ACM-SIGMOD International Conference on the Management of Data. Minneapolis, MA, R. T. Snodgrass and M. Winslett, Eds. ACM Press, 289--300.]]
Trotman, A. 2003. Compressing inverted files. Kluwer Int. J. Inform. Retriev. 6, 5--19.]]
Turtle, H. and Flood, J. 1995. Query evaluation: strategies and optimizations. Inform. Proc. Manag. 31, 1 (Nov.), 831--850.]]
van Rijsbergen, C. J. 1979. Informat. Retriev., 2nd Ed. Butterworths, London, UK.]]
Vasanthakumar, S. R., Callan, J. P., and Croft, W. B. 1996. Integrating INQUERY with an RDBMS to support text retrieval. Bull. Techn. Comm. Data Eng. 19, 1, 24--33.]]
Vidick, J. L., Ed. 1990. Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Brussels, Belgium. ACM Press,]]
Voorhees, E. M. 1986. The efficiency of inverted index and cluster searches. In Proceedings of the 9th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy. 164--174.]]
Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiment and evaluation in information retrieval. MIT Press, Cambridge, MA.]]
Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.]]
Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference. Auckland, New Zealand. M. Orlowska, Ed. Australian Computer Society, 141--152.]]
Williams, H. E., Zobel, J., and Bahle, D. 2004. Fast phrase querying with combined indexes. ACM Trans. Inform. Syst. 22, 4, 573--594.]]
Witten, I. H., Bell, T. C., and Nevill, C. G. 1991. Models for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference. Snowbird, UT, J. A. Storer and J. H. Reif, Eds. IEEE Computer Society Press, Los Alamitos, CA. 23--32.]]
Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd Ed. Morgan Kaufmann, San Francisco, CA.]]
Wong, W. Y. P. and Lee, D. K. 1993. Implementations of partial document ranking using inverted files. Inform. Proc. Manag. 29, 5 (Sept.), 647--669.]]
Xi, W., Sornil, O., and Fox, E. A. 2002a. Hybrid partition inverted files for large-scale digital libraries. In Proceedings of the Digital Library: IT Opportunities and Challenges in the New Millennium. Beijing Library Press, Beijing, China, 404--418.]]
Xi, W., Sornil, O., Luo, M., and Fox, E. A. 2002b. Hybrid partition inverted files: Experimental validation. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. Rome, Italy, M. Agosti and C. Thanos, Eds. Lecture Notes in Computer Science, vol. 2458, Springer, 422--413.]]
Zezula, P., Rabitti, F., and Tiberio, P. 1991. Dynamic partitioning of signature files. ACM Tran. Inform. Syst. 9, 4 (Oct.), 336--369.]]
Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1, 18--34.]]
Zobel, J., Moffat, A., and Ramamohanarao, K. 1996. Guidelines for presentation and comparison of indexing techniques. SIGMOD Record 25, 3 (Oct.), 10--15.]]
Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Datab. Syst. 23, 4 (Dec.), 453--490.]]
Zobel, J., Moffat, A., and Sacks-Davis, R. 1992. An efficient indexing technique for full-text database systems. In Proc. VLDB Int. Conf. on Very Large Databases, L.-Y. Yuan, Ed. Morgan Kaufmann, Vancouver, 352--362.]]
Zobel, J., Moffat, A., and Sacks-Davis, R. 1993a. Searching large lexicons for partially specified terms using compressed inverted files. In Proceedings of the International Conference on Very Large Databases. Dublin, Ireland, R. Agrawal, S. Baker, and D. Bell, Eds. Morgan Kaufmann, 290--301.]]
Zobel, J., Moffat, A., and Sacks-Davis, R. 1993b. Storage management for files of dynamic records. In Proceedings of the Australasian Database Conference. Brisbane, Australia, 26--38.]]
Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. 1995. Efficient retrieval of partial documents. Inform. Proc. Manag. 31, 3, 361--377.]]

Cited By

View all
  • (2025)Quam: Adaptive Retrieval through Query Affinity ModellingProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703584(954-962)Online publication date: 10-Mar-2025
  • (2025)Consistent query answering in multi-relation databasesInformation and Computation10.1016/j.ic.2025.105279303(105279)Online publication date: Mar-2025
  • (2025)Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screeningVisual Intelligence10.1007/s44267-025-00073-23:1Online publication date: 6-Mar-2025
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 38, Issue 2
145 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2006
Published in CSUR Volume 38, Issue 2


Request permissions for this article.

Check for updates

Author Tags

  1. Inverted file indexing
  2. Web search engine
  3. document database
  4. information retrieval
  5. text retrieval


  • Article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)166
  • Downloads (Last 6 weeks)23
Reflects downloads up to 07 Mar 2025

Other Metrics


Cited By

View all
  • (2025)Quam: Adaptive Retrieval through Query Affinity ModellingProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703584(954-962)Online publication date: 10-Mar-2025
  • (2025)Consistent query answering in multi-relation databasesInformation and Computation10.1016/j.ic.2025.105279303(105279)Online publication date: Mar-2025
  • (2025)Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screeningVisual Intelligence10.1007/s44267-025-00073-23:1Online publication date: 6-Mar-2025
  • (2024)GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple DocumentsInformation10.3390/info1506031815:6(318)Online publication date: 29-May-2024
  • (2024)Fulgor: a fast and compact k-mer index for large-scale matching and color queriesAlgorithms for Molecular Biology10.1186/s13015-024-00251-919:1Online publication date: 22-Jan-2024
  • (2024)Bridging Dense and Sparse Maximum Inner Product SearchACM Transactions on Information Systems10.1145/366532442:6(1-38)Online publication date: 19-Aug-2024
  • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
  • (2024)Text Matching Indexers in Taobao SearchProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671654(5339-5350)Online publication date: 25-Aug-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2024)Generalized Universal Coding of IntegersIEEE Transactions on Communications10.1109/TCOMM.2024.337938572:8(4538-4550)Online publication date: Aug-2024
  • Show More Cited By

View Options

Login options

Full Access

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media