Skip to main content
Log in

MC-JBIG2: an improved algorithm for Chinese textual image compression

  • Full Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Standard JBIG2 algorithms for textual image compression focus on the features of alphabetic characters such as English, not considering the features of pictograph characters such as Chinese. In this work, an improved algorithm called MC-JBIG2 is developed, which aims at improving compression ratio for Chinese textual images. In the proposed method, first multiple features are extracted from the characters in the images. After that, a cascade of clusters is introduced to accomplish the pattern-matching task for the characters. Finally, to optimize the parameters used in the cascade of clusters, a Monte Carlo strategy is implemented to traverse the feasible space. Experimental results show MC-JBIG2 outperforms existing representative JBIG2 algorithms and systems on Chinese textual images. MC-JBIG2 can also improve compression ratio on Latin textual images, however, the improvement on Latin textual images is not as stable as the improvement on Chinese ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Kia, O.E., Doermann, D.S.: Integrated segmentation and clustering for enhanced compression of document images. In: ICDAR, p. 406 (1997)

  2. Kia, O.E., Doermann, D.S., Rosenfeld, A., Chellappa, R.: Symbolic Compression and Processing of Document Images, University of Maryland, College Park, Tech. Rep. LAMP-TR-004,CFAR-TR-849,CS-TR-3734, January (1997)

  3. Lee, D.S., Hull, J.: Duplicate detection for symbolically compressed documents. In: ICDAR, pp. 305–308 (1999)

  4. Luong H.Q., Philips W.: Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. Int. J. Doc. Anal. Recognit. 11(1), 39–51 (2008)

    Article  Google Scholar 

  5. Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. 2nd edn. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  6. I. ITU, Information technology—coded representation of picture and audio information—lossy/lossless coding of bi-level images (jbig2). ITU-T Recommendation T.88 | ISO/IEC 14492 (1999)

  7. ITU, Mixed raster content (mrc), ITU-T Recommendation T.44 (1997)

  8. Haffner, P., Bottou, L., Howard, P.G., LeCun, Y.: Djvu: analyzing and compressing scanned documents for internet distribution. In: ICDAR, Washington, DC, USA, pp. 625–628 (Sep 1999)

  9. Haffner, P., Bottou, L., Howard, P.G., Simard, P., Bengio, Y., Cun, Y.L.: Browsing through high quality document images with djvu. In: ADL ’98: Proceedings of the Advances in Digital Libraries Conference, p. 309. IEEE Computer Society, Washington, DC, USA (1998)

  10. Howard P.G., Kossentini F., Martins B., Forchhammer S., Rucklidge W.J., Ono F.: The emerging jbig2 standard. IEEE Trans. Circuits Syst. Video Technol. 8, 838–848 (1998)

    Article  Google Scholar 

  11. Glassner A.: Graphic Gems. Academic Press, Boston (1990)

    Google Scholar 

  12. Foley J.D., Van Dam A.: Fundamentals of Interactive Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)

    Google Scholar 

  13. Garain, U., Debnath, S., Mandal, A., Chaudhuri, B.B.: Compression of scan-digitized indian language printed text: a soft pattern matching technique. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 185–192. ACM, New York, NY, USA (2003)

  14. Grailu H., Lotfizad M., Yazdi H.S.: Farsi and arabic document images lossy compression based on the mixed raster content model. Int. J. Doc. Anal. Recognit. 12(4), 227–248 (2009)

    Article  Google Scholar 

  15. Saykol E., Sinop A.K., Güdükbay U., Ulusoy Ö., Çetin A.E.: Content-based retrieval of historical ottoman documents stored as textual images. IEEE Trans. Image Process. 13(3), 314–325 (2004)

    Article  Google Scholar 

  16. Dai R., Liu C., Xiao B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1(2), 126–136 (2007)

    Article  Google Scholar 

  17. Ye, Y.: Text image compression based on pattern matching. Ph.D. dissertation, University of California, USA (1998)

  18. Ye Y., Cosman P.C.: Dictionary design for text image compression with jbig2. IEEE Trans. Image Process. 10(6), 818–828 (2001)

    Article  MATH  Google Scholar 

  19. Ye Y., Cosman P.: Fast and memory efficient text image compression with jbig2. IEEE Trans. Image Process. 12(8), 944–956 (2003)

    Article  Google Scholar 

  20. Chen, S., Yan, H., Xu, Z.: Compression of chinese document images based on morphologic analysis and pattern matching. Opt. Eng. 45(10) (2006)

  21. Shang, J., Liu, C., Ding, X.: Jbig2 text image compression based on ocr. In: Proceedings of the Society of Photo-optical Instrumentation Engineering (SPIE), vol. 6067 (2006)

  22. Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), pp. 511–518 (2001)

  23. Han J., Kamber M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005)

    Google Scholar 

  24. Liu J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008)

    MATH  Google Scholar 

  25. Hu, K., Tang, Z., Liang, X.: The valuation of china venture capital guiding fund policy based on options model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2788–2793 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kui Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, K., Tang, Z., Gao, L. et al. MC-JBIG2: an improved algorithm for Chinese textual image compression. IJDAR 13, 271–284 (2010). https://doi.org/10.1007/s10032-010-0126-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-010-0126-4

Keywords

Navigation