Skip to main content

Word-Based Fixed and Flexible List Compression

  • Conference paper
Computer and Information Sciences - ISCIS 2005 (ISCIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3733))

Included in the following conference series:

  • 2610 Accesses

Abstract

We present a dictionary based lossless text compression scheme where we keep frequent words in separate lists (list_n contains words of length n). We pursued two alternatives in terms of the lengths of the lists. In the "fixed" approach all lists have equal number of words whereas in the "flexible" approach no such constraint is imposed. Results clearly show that the "flexible" scheme is much better in all test cases possibly due to the fact that it can accomodate short, medium or long word lists reflecting on the word length distributions of a particular language. Our approach encodes a word as a prefix (the length of the word) and the body of the word (as an index in the corresponding list). For prefix encoding we have employed both a static encoding and a dynamic encoding (Huffman) using the word length statistics of the source language. Dynamic prefix encoding clearly outperformed its static counterpart in all cases. A language with a higher average word length can, theoretically, benefit more from a word-list based compression approach as compared to one with a lower average word length. We have put this hypothesis to test using Turkish and English languages with average word lengths of 6.1 and 4.4, respectively. Our results strongly support the validity of this hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I., Moffat, A., Bell, T.C.: Managing Gigabytes – Compressing and Indexing Documents and Images, San Francisco, CA, USA (1999)

    Google Scholar 

  2. Nelson, M.: The Data Compression Book. NewYork, USA, ch. 3 (1996)

    Google Scholar 

  3. Diri, B.: A Text Compression System Based on the Morphology of Turkish Language. In: International Symposium on Computer and Information Sciences (ISCIS) XV, October 11-13. Yildiz Technical University, Istanbul (2000)

    Google Scholar 

  4. Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A Locally Adaptive Data Compression Scheme. Communications of the ACM 29(4), 320–330 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  5. Teahan, W.J.: Modelling English Text. In: The Entropy of English Using PPM Based Models, ch. 8, p. 140 (1998)

    Google Scholar 

  6. Celikel, E., Dincer, B.T.: Improving the Compression Performance of Turkish Texts with PoS Tags. In: International Conference on Information and Knowledge Engineering (IKE 2004), Las Vegas, NV, USA, pp. 519–523 (2004)

    Google Scholar 

  7. Dalkılıç, M.E., Dalkılıç, G.: Some Measurable Language Characteristics of Printed Turkish. In: International Symposium on Computer and Information Sciences (ISCIS) XVI, Antalya, November 5-7 (2001)

    Google Scholar 

  8. Diri, B.: A System for Turkish Texts Based on the Analysis of Turkish Language Structure and Providing Dynamic Compression with Word-based Lossless Recovery (in Turkish) PhD thesis. Yildiz Technical University, Istanbul (1999)

    Google Scholar 

  9. Koltuksuz, A.H.: Cryptanalitic Measures of Turkish for Symmetrical Cryptosystems (in Turkish) PhD Thesis. Ege University Department of Computer Engineering, Izmir, Turkey (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Celikel, E., Dalkilic, M.E., Dalkilic, G. (2005). Word-Based Fixed and Flexible List Compression. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_80

Download citation

  • DOI: https://doi.org/10.1007/11569596_80

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29414-6

  • Online ISBN: 978-3-540-32085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics