Abstract
Standard JBIG2 algorithms for textual image compression focus on the features of alphabetic characters such as English, not considering the features of pictograph characters such as Chinese. In this work, an improved algorithm called MC-JBIG2 is developed, which aims at improving compression ratio for Chinese textual images. In the proposed method, first multiple features are extracted from the characters in the images. After that, a cascade of clusters is introduced to accomplish the pattern-matching task for the characters. Finally, to optimize the parameters used in the cascade of clusters, a Monte Carlo strategy is implemented to traverse the feasible space. Experimental results show MC-JBIG2 outperforms existing representative JBIG2 algorithms and systems on Chinese textual images. MC-JBIG2 can also improve compression ratio on Latin textual images, however, the improvement on Latin textual images is not as stable as the improvement on Chinese ones.
Similar content being viewed by others
References
Kia, O.E., Doermann, D.S.: Integrated segmentation and clustering for enhanced compression of document images. In: ICDAR, p. 406 (1997)
Kia, O.E., Doermann, D.S., Rosenfeld, A., Chellappa, R.: Symbolic Compression and Processing of Document Images, University of Maryland, College Park, Tech. Rep. LAMP-TR-004,CFAR-TR-849,CS-TR-3734, January (1997)
Lee, D.S., Hull, J.: Duplicate detection for symbolically compressed documents. In: ICDAR, pp. 305–308 (1999)
Luong H.Q., Philips W.: Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. Int. J. Doc. Anal. Recognit. 11(1), 39–51 (2008)
Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. 2nd edn. Morgan Kaufmann, San Francisco (1999)
I. ITU, Information technology—coded representation of picture and audio information—lossy/lossless coding of bi-level images (jbig2). ITU-T Recommendation T.88 | ISO/IEC 14492 (1999)
ITU, Mixed raster content (mrc), ITU-T Recommendation T.44 (1997)
Haffner, P., Bottou, L., Howard, P.G., LeCun, Y.: Djvu: analyzing and compressing scanned documents for internet distribution. In: ICDAR, Washington, DC, USA, pp. 625–628 (Sep 1999)
Haffner, P., Bottou, L., Howard, P.G., Simard, P., Bengio, Y., Cun, Y.L.: Browsing through high quality document images with djvu. In: ADL ’98: Proceedings of the Advances in Digital Libraries Conference, p. 309. IEEE Computer Society, Washington, DC, USA (1998)
Howard P.G., Kossentini F., Martins B., Forchhammer S., Rucklidge W.J., Ono F.: The emerging jbig2 standard. IEEE Trans. Circuits Syst. Video Technol. 8, 838–848 (1998)
Glassner A.: Graphic Gems. Academic Press, Boston (1990)
Foley J.D., Van Dam A.: Fundamentals of Interactive Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)
Garain, U., Debnath, S., Mandal, A., Chaudhuri, B.B.: Compression of scan-digitized indian language printed text: a soft pattern matching technique. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 185–192. ACM, New York, NY, USA (2003)
Grailu H., Lotfizad M., Yazdi H.S.: Farsi and arabic document images lossy compression based on the mixed raster content model. Int. J. Doc. Anal. Recognit. 12(4), 227–248 (2009)
Saykol E., Sinop A.K., Güdükbay U., Ulusoy Ö., Çetin A.E.: Content-based retrieval of historical ottoman documents stored as textual images. IEEE Trans. Image Process. 13(3), 314–325 (2004)
Dai R., Liu C., Xiao B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1(2), 126–136 (2007)
Ye, Y.: Text image compression based on pattern matching. Ph.D. dissertation, University of California, USA (1998)
Ye Y., Cosman P.C.: Dictionary design for text image compression with jbig2. IEEE Trans. Image Process. 10(6), 818–828 (2001)
Ye Y., Cosman P.: Fast and memory efficient text image compression with jbig2. IEEE Trans. Image Process. 12(8), 944–956 (2003)
Chen, S., Yan, H., Xu, Z.: Compression of chinese document images based on morphologic analysis and pattern matching. Opt. Eng. 45(10) (2006)
Shang, J., Liu, C., Ding, X.: Jbig2 text image compression based on ocr. In: Proceedings of the Society of Photo-optical Instrumentation Engineering (SPIE), vol. 6067 (2006)
Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), pp. 511–518 (2001)
Han J., Kamber M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005)
Liu J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008)
Hu, K., Tang, Z., Liang, X.: The valuation of china venture capital guiding fund policy based on options model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2788–2793 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, K., Tang, Z., Gao, L. et al. MC-JBIG2: an improved algorithm for Chinese textual image compression. IJDAR 13, 271–284 (2010). https://doi.org/10.1007/s10032-010-0126-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-010-0126-4