Abstarct
In this article we present a new compression method, called WLZW, which is a word-based modication of classic LZW. The modication is similar to the approach used in the HuffWord compression algorithm. The algorithm is two-phase, the compression ratio achieved is fairly good, on average 22%-20% (see [2],[3]). Moreover, the table of words, which is side product of compression, can be used to create full-text index, especially for dynamic text databases. Overhead of the index is good.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
1. T. C. Bell. et al.: Data Compression in Full-Text Retrieval Systems, Journal of the American Society for Information Science. 44(9), 1993, pp.508–531.
2. J. Dvorský, V. Snášel, J. Pokorný: Word-based Compression Methods for Text Retrieval Systems. Proc. DATASEM'98, Brno 1998
3. J. Dvorský, V. Snášel, J. Pokorný. Word-based Compression Methods for Large Text Documents. Data Compression Conferencs-DCC '99, Snowbird, Utah USA.
4. W. F. Frakes, R. B. Yates Ed.: Information Retrieval, Data Structures & Algorithms. Prentice Hall 1992
5. G. H. Gonnet, R. Beaza-Yates: Handbook of Algorithms and Data Structures. Addison-Wesley Publishing, 1991
6. R. N. Horspool, G. V. Cormack: Construction Word-based Text Compression Algorithms, Proc. 2nd IEEE Data Compression Conference, Snowbird, 1992
7. D. Húsek, M. Krejčí, V. Snášel: Compress methods for Text Databases. Cofax 96, Bratislava. (in Czech)
8. B. Melichar, J. Pokorný: Data Compression. Survey Rep. DC-94-07. Czech Technical University, Praha 1994
9. K. Sayood: Introduction to data compression. Morgan Kaufmann Publishing, 1996
10. D. Salomon: Data Compression, Springer Verlag, 1998
11. T. A. Welch: A Technique for High-Performance Data Compression. IEEE Computer 17, 6, 1984, pp. 8–19
12. I. H. Witten, A. Moffat, T. C. Bell: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
13. J. Ziv, A. Lempel: An universal algorithm for sequential data compression, IEEE transac. on Information Theory, Vol. IT-23,No.3., 1977, pp.337–343
14. J. Ziv, A. Lempel: Compression of individual sequences via variable-rate coding, IEEE transac. on Information Theory, Vol. IT-24,No.5., 1978, pp.530–536
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Heidelberg Berlin
About this paper
Cite this paper
Dvorský, J., Pokorný, J., Snášel, V. (1999). Word-Based Compression Methods and Indexing for Text Retrieval Systems. In: Eder, J., Rozman, I., Welzer, T. (eds) Advances in Databases and Information Systems. ADBIS 1999. Lecture Notes in Computer Science, vol 1691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48252-0_6
Download citation
DOI: https://doi.org/10.1007/3-540-48252-0_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66485-7
Online ISBN: 978-3-540-48252-9
eBook Packages: Springer Book Archive