Skip to main content

Compression and full-text indexing for Digital Libraries

  • Classification and Indexing
  • Conference paper
  • First Online:
Digital Libraries Current Issues (DL 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 916))

Included in the following conference series:

Abstract

This chapter has demonstrated the feasibility of full-text indexing of large information bases. The use of modern compression techniques means that there is no space penalty: large document databases can be compressed and indexed in less than a third of the space required by the originals. Surprisingly, there is little or no time penalty either: querying can be faster because less information needs to be read from disk. Simple queries can be answered in a second; more complex ones with more query terms may take a few seconds. One important application is the creation of static databases on CD-ROM, and a 1.5 gigabyte document database can be compressed onto a standard 660 megabyte CD-ROM.

Creating a compressed and indexed document database containing hundreds of thousands of documents and gigabytes of data takes a few hours. Whereas retrieval can be done on ordinary workstations, creation requires a machine with a fair amount of main memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Nabil R. Adam Bharat K. Bhargava Yelena Yesha

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Witten, I.H., Moffat, A., Bell, T.C. (1995). Compression and full-text indexing for Digital Libraries. In: Adam, N.R., Bhargava, B.K., Yesha, Y. (eds) Digital Libraries Current Issues. DL 1994. Lecture Notes in Computer Science, vol 916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026856

Download citation

  • DOI: https://doi.org/10.1007/BFb0026856

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59282-2

  • Online ISBN: 978-3-540-49230-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics