Skip to main content
Log in

A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Pattern matching is the most widely used technique for the compression of printed bi-level text images. In some printed scripts, letters normally attach to each other, or some letters have a simple relation to each other, or there may be undesired touching characters. Detecting such situations and exploiting them to reduce the library size, has a rather great effect on the compression ratio. In this paper, a lossy/lossless compression method for printed typeset bi-level text images is proposed for archiving purposes. For this, three techniques are proposed. First, the number of library prototypes is reduced by detecting and exploiting the mentioned situations. Second, a new effective encoding scheme is proposed for patterns and numbers. Third, three levels are proposed for lossy compression. Experimental results show that the proposed method works better, as high as 1.4–3.3 times in lossy case and 1.2–2.7 times in lossless case at 300 dpi, than the best existing compression methods or standards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Witten I.H., Bell T.C., Emberson H., Inglis S., Moffat A.: Textual image compression: two-stage lossy/lossless encoding of textual images. Proc. IEEE 82, 6 (1994)

    Article  Google Scholar 

  2. Gersho A., Gray R.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)

    MATH  Google Scholar 

  3. Jain A.: Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  4. Barnsley M.F., Hurd L.P.: Fractal Image Compression. Peters, Wellesley (1993)

    MATH  Google Scholar 

  5. Ascher R.N., Nagy G.: A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23, 1174–1179 (1974)

    Article  MATH  Google Scholar 

  6. Pratt W.K., Capitant P.J., Chen W.H., Hamilton E.R., Wallis R.H.: Combined symbol matching facsimile data compression system. Proc. IEEE 68(7), 786–796 (1980)

    Article  Google Scholar 

  7. Holt M.J.: A fast bi-level template matching algorithm for document image data compression. In: Kittler, J. (ed.) Pattern Recognition, pp. 230–239. Springer, Berlin (1988)

  8. Carvalho M.B., Silva E.A.B., Finamore W.A.: Multidimensional signal compression using multiscale recurrent patterns. Signal Proc. 82, 1559–1580 (2002)

    Article  MATH  Google Scholar 

  9. Kia O.E., Doermann D.S., Rosenfeld A., Chellappa R.: Symbolic compression and processing of document images. Comp. Vis. Image Understanding 70(3), 335–349 (1998)

    Article  Google Scholar 

  10. Kia O.E., Doermann D.S.: Residual coding in document image compression. IEEE Trans. Image Proc. 9(6), 961–969 (2000)

    Article  Google Scholar 

  11. Elias, P.: Universal codeword sets and representations of the inteers. IEEE Trans. Inform. Theor. 21,(2) (1975)

  12. Moffat, A.: Two level context based compression of bi-level images. In: Proceedings IEEE Data Compression Conference, pp. 382–391 (1991)

  13. Kanungo, T., Haralick, R.M., Phillips, I.T.: Global and local document degradation models. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 730–734 (1993)

  14. Brickman N.F., Rosenbaum W.S.: Word autocorrelation redundancy match (WARM) technology. IBM J. Res. Devel. 26(6), 681–686 (1982)

    Article  Google Scholar 

  15. Johnsen O., Segen J., Cash G.L.: Coding of two-level pictures by pattern matching and substitution. Bell Syst. Tech. J 62(8), 2513–2545 (1983)

    Google Scholar 

  16. Holt, M.J.J., Xydeas, C.S.: Recent developments in image data compression for digital facsimile. ICL Tech. J. 123–146 (1986)

  17. Howard P.G., Kossentini F., Forchhammer S., Ruchlidge W.J.: The emerging JBIG2 standard. IEEE Trans. Circuits Sys. Video Tech. 8(7), 838–848 (1998)

    Article  Google Scholar 

  18. Haskel B.G. et al.: Image and video coding-emerging standards and beyond. IEEE Trans. Circuits Syst. Video Tech. 8(7), 814–837 (1998)

    Article  Google Scholar 

  19. Wei, S.D., Lai, S.H.: Efficient Normalized Cross Correlation Based on Adaptive Multilevel Successive Elimination. ACCV 2007, pp. 638–646 (2007)

  20. Stefano, L.D., Mattoccia, S., Tombari, F.: An Algorithm for Efficient and Exhaustive Template Matching. ICIAR 2004, pp. 408–415 (2004)

  21. Cirrincione, G., Cirrincione, M.: Neural Networks for Matching in Computer Vision. KES 2007/WIRN 200, pp. 688–695 (2007)

  22. Marimon, D., Ebrahimi, T.: Efficient Rotation-Discriminative Template Matching. CIARP 2007, pp. 221–230 (2007)

  23. Eggert, J., Zhang, C., Corner, E.: Template Matching for Large Transformations. ICANN 2007, pp. 169–179 (2007)

  24. Song, J., Chen, B., Chi, Z., Qiu, X., Wang, W.: Face Recognition Based on Bi-level Template Matching. ICIC 2007, pp. 1131–1139 (2007)

  25. Ye Y., Cosman P.: Fast and memory efficient text image compression with JBIG2. IEEE Trans. Image Proc. 12(8), 944–956 (2003)

    Article  Google Scholar 

  26. Howard P.G.: Text image compression using soft pattern matching. Comput. J. 40(2/3), 146–156 (1997)

    Article  Google Scholar 

  27. Ye Y., Cosman P.: Dictionary design for text image compression with JBIG2. IEEE Trans. Image Process. 10(6), 818–828 (2001)

    Article  MATH  Google Scholar 

  28. Salomon D.: A concise introduction to data compression. Springer, London (2008)

    MATH  Google Scholar 

  29. Sayood, (K.) (eds): Lossless Compression Handbook. Academic Press, New York (2003)

    Google Scholar 

  30. Salomon D.: Data Compression, The Complete Reference, 4th edn. Springer, London (2007)

    MATH  Google Scholar 

  31. Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Academic Press, New York (1999)

    Google Scholar 

  32. Pu I.M.: Fundamental Data Compression. Butterworth- Heinemann, London (2006)

    Google Scholar 

  33. Kok, C.W., Nguyen, T.Q.: Document image compression by sub-band system. In: IEEE International Symposium on Circuits and Systems, ISCAS ’96, vol. 2, pp. 688–691 (1996)

  34. de Queiroz, R.L., Buckley, R., Xu, M.: Mixed raster content (MRC) model for compound image compression. SPIE Conf. Vis. Commun. Image Process. 3653, 1106–1117 (1999)

    Google Scholar 

  35. Lam E.Y.: Compound document compression with model-based biased reconstruction. J. Electron. Imaging 13(1), 191–194 (2004)

    Article  Google Scholar 

  36. Cheng, H., Bouman, C.A.: Multilayer Document Compression Algorithm. IEEE ICIP Oct (1999)

  37. Feng G., Bouman C.A.: High quality MRC document coding. IEEE Trans. Image Process. 15(10), 3152–3169 (2006)

    Article  Google Scholar 

  38. Haneda, E., Yi, J., Bouman, C.A.: Segmentation for MRC compression. In: Proceedings of SPIE/IS&T, vol. 6493 (2007)

  39. Huttenlocher D., Felzenszwalb P., Rucklidge W.: Digipaper: a versatile color document image representation. IEEE ICIP 1, 219–223 (1999)

    Google Scholar 

  40. Bottou L., Haffner P., Howard P.G., Simard P., Bengio Y., LeCun Y.: High quality document image compression with ‘DjVu’. J. Electron. Imaging 7, 410–425 (1998)

    Article  Google Scholar 

  41. Barthel, K., McPartlin, S., Thierschmann, M.: New technology for raster document image compression. In: SPIE Conf. on Document Recognition and Retrieval VII, vol. 3967, pp. 286–290 (2000)

  42. Hankerson D., Harris G.A., Johnson P.D.: Introduction to Information Theory and Data Compression, 2nd edn. Chapman & Hall/CRC Press, London (2003)

    MATH  Google Scholar 

  43. JBIG, Progressive Bi-Level Image Compression, ISO/IEC International Standard 11544, ITU-T Recommendation T.82 (1993)

  44. Moffat, A.: Two-level context based compression of bi-level images. In: Storer, J.A., Reif, J.H. (eds.) In: Proceedings IEEE Data Compression Confernce. IEEE Computer Society Press, Los Alamitos, pp. 382–391 (1991)

  45. Bell T.C., Cleary J.G., Witten I.H.: Text Compression. Prentice Hall, Englewood Cliffs (1990)

    Google Scholar 

  46. Yang Y., Yan H., Yu D.: Content-lossless document image compression based on structural analysis and pattern matching. Patt. Recogn. 33, 1277–1293 (2000)

    Article  Google Scholar 

  47. Jahne B.: Practical Handbook on Image Processing for Scientific and Technical Applications, 2nd edn. CRC Press, New york (2004)

    Google Scholar 

  48. Inglis, S., Witten, I.H.: Compression-Based Template Matching. Data Compression Conference, DCC ’94 Proceedings, pp. 106–115 (1994)

  49. Zhang, Q., Danskin, J.M.: A pattern-based lossy compression scheme for document images. Electronic Publishing 8, (2, 3), 221–233

  50. Ono, F., Howard, P.G., Fernandes, D.: The Emerging JBIG-2 Standard. ISO/IEC JTC 1/SC 29/WG1 (ITU-T SG8), N1397, July (1997)

  51. Chen S., Yan H., Xu Z.: Compression of Chinese document images based on morphological analysis and pattern matching. Opt. Eng. 45(10), 107001 (2006)

    Article  Google Scholar 

  52. Gerek O.N., Cetin A.E., Tewfik A.H.: Wavelet domain textual coding of ottoman script images. Proc. SPIE 2727, 568–578 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mojtaba Lotfizad.

Additional information

This work was supported by the Iranian Telecommunication Research Center (ITRC).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grailu, H., Lotfizad, M. & Sadoghi-Yazdi, H. A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching. IJDAR 11, 159–182 (2009). https://doi.org/10.1007/s10032-008-0075-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-008-0075-3

Keywords

Navigation