MC-JBIG2: an improved algorithm for Chinese textual image compression

Hu, Kui; Tang, Zhi; Gao, Liangcai; Mu, Yadong

doi:10.1007/s10032-010-0126-4

MC-JBIG2: an improved algorithm for Chinese textual image compression

Full Paper
Published: 18 August 2010

Volume 13, pages 271–284, (2010)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Kui Hu¹,
Zhi Tang¹,
Liangcai Gao¹ &
…
Yadong Mu²

107 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Standard JBIG2 algorithms for textual image compression focus on the features of alphabetic characters such as English, not considering the features of pictograph characters such as Chinese. In this work, an improved algorithm called MC-JBIG2 is developed, which aims at improving compression ratio for Chinese textual images. In the proposed method, first multiple features are extracted from the characters in the images. After that, a cascade of clusters is introduced to accomplish the pattern-matching task for the characters. Finally, to optimize the parameters used in the cascade of clusters, a Monte Carlo strategy is implemented to traverse the feasible space. Experimental results show MC-JBIG2 outperforms existing representative JBIG2 algorithms and systems on Chinese textual images. MC-JBIG2 can also improve compression ratio on Latin textual images, however, the improvement on Latin textual images is not as stable as the improvement on Chinese ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Kia, O.E., Doermann, D.S.: Integrated segmentation and clustering for enhanced compression of document images. In: ICDAR, p. 406 (1997)
Kia, O.E., Doermann, D.S., Rosenfeld, A., Chellappa, R.: Symbolic Compression and Processing of Document Images, University of Maryland, College Park, Tech. Rep. LAMP-TR-004,CFAR-TR-849,CS-TR-3734, January (1997)
Lee, D.S., Hull, J.: Duplicate detection for symbolically compressed documents. In: ICDAR, pp. 305–308 (1999)
Luong H.Q., Philips W.: Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. Int. J. Doc. Anal. Recognit. 11(1), 39–51 (2008)
Article Google Scholar
Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. 2nd edn. Morgan Kaufmann, San Francisco (1999)
Google Scholar
I. ITU, Information technology—coded representation of picture and audio information—lossy/lossless coding of bi-level images (jbig2). ITU-T Recommendation T.88 | ISO/IEC 14492 (1999)
ITU, Mixed raster content (mrc), ITU-T Recommendation T.44 (1997)
Haffner, P., Bottou, L., Howard, P.G., LeCun, Y.: Djvu: analyzing and compressing scanned documents for internet distribution. In: ICDAR, Washington, DC, USA, pp. 625–628 (Sep 1999)
Haffner, P., Bottou, L., Howard, P.G., Simard, P., Bengio, Y., Cun, Y.L.: Browsing through high quality document images with djvu. In: ADL ’98: Proceedings of the Advances in Digital Libraries Conference, p. 309. IEEE Computer Society, Washington, DC, USA (1998)
Howard P.G., Kossentini F., Martins B., Forchhammer S., Rucklidge W.J., Ono F.: The emerging jbig2 standard. IEEE Trans. Circuits Syst. Video Technol. 8, 838–848 (1998)
Article Google Scholar
Glassner A.: Graphic Gems. Academic Press, Boston (1990)
Google Scholar
Foley J.D., Van Dam A.: Fundamentals of Interactive Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)
Google Scholar
Garain, U., Debnath, S., Mandal, A., Chaudhuri, B.B.: Compression of scan-digitized indian language printed text: a soft pattern matching technique. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 185–192. ACM, New York, NY, USA (2003)
Grailu H., Lotfizad M., Yazdi H.S.: Farsi and arabic document images lossy compression based on the mixed raster content model. Int. J. Doc. Anal. Recognit. 12(4), 227–248 (2009)
Article Google Scholar
Saykol E., Sinop A.K., Güdükbay U., Ulusoy Ö., Çetin A.E.: Content-based retrieval of historical ottoman documents stored as textual images. IEEE Trans. Image Process. 13(3), 314–325 (2004)
Article Google Scholar
Dai R., Liu C., Xiao B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1(2), 126–136 (2007)
Article Google Scholar
Ye, Y.: Text image compression based on pattern matching. Ph.D. dissertation, University of California, USA (1998)
Ye Y., Cosman P.C.: Dictionary design for text image compression with jbig2. IEEE Trans. Image Process. 10(6), 818–828 (2001)
Article MATH Google Scholar
Ye Y., Cosman P.: Fast and memory efficient text image compression with jbig2. IEEE Trans. Image Process. 12(8), 944–956 (2003)
Article Google Scholar
Chen, S., Yan, H., Xu, Z.: Compression of chinese document images based on morphologic analysis and pattern matching. Opt. Eng. 45(10) (2006)
Shang, J., Liu, C., Ding, X.: Jbig2 text image compression based on ocr. In: Proceedings of the Society of Photo-optical Instrumentation Engineering (SPIE), vol. 6067 (2006)
Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), pp. 511–518 (2001)
Han J., Kamber M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005)
Google Scholar
Liu J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008)
MATH Google Scholar
Hu, K., Tang, Z., Liang, X.: The valuation of china venture capital guiding fund policy based on options model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2788–2793 (2007)

Download references

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, 100871, Beijing, China
Kui Hu, Zhi Tang & Liangcai Gao
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 117576, Singapore
Yadong Mu

Authors

Kui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Liangcai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yadong Mu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kui Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, K., Tang, Z., Gao, L. et al. MC-JBIG2: an improved algorithm for Chinese textual image compression. IJDAR 13, 271–284 (2010). https://doi.org/10.1007/s10032-010-0126-4

Download citation

Received: 15 December 2009
Revised: 24 May 2010
Accepted: 05 August 2010
Published: 18 August 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10032-010-0126-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MC-JBIG2: an improved algorithm for Chinese textual image compression

Abstract

Access this article

Similar content being viewed by others

Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images

A review on document image analysis techniques directly in the compressed domain

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MC-JBIG2: an improved algorithm for Chinese textual image compression

Abstract

Access this article

Similar content being viewed by others

Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images

A review on document image analysis techniques directly in the compressed domain

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation