Abstract
This chapter has demonstrated the feasibility of full-text indexing of large information bases. The use of modern compression techniques means that there is no space penalty: large document databases can be compressed and indexed in less than a third of the space required by the originals. Surprisingly, there is little or no time penalty either: querying can be faster because less information needs to be read from disk. Simple queries can be answered in a second; more complex ones with more query terms may take a few seconds. One important application is the creation of static databases on CD-ROM, and a 1.5 gigabyte document database can be compressed onto a standard 660 megabyte CD-ROM.
Creating a compressed and indexed document database containing hundreds of thousands of documents and gigabytes of data takes a few hours. Whereas retrieval can be done on ordinary workstations, creation requires a machine with a fair amount of main memory.
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Witten, I.H., Moffat, A., Bell, T.C. (1995). Compression and full-text indexing for Digital Libraries. In: Adam, N.R., Bhargava, B.K., Yesha, Y. (eds) Digital Libraries Current Issues. DL 1994. Lecture Notes in Computer Science, vol 916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026856
Download citation
DOI: https://doi.org/10.1007/BFb0026856
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59282-2
Online ISBN: 978-3-540-49230-6
eBook Packages: Springer Book Archive