A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching

Grailu, Hadi; Lotfizad, Mojtaba; Sadoghi-Yazdi, Hadi

doi:10.1007/s10032-008-0075-3

A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching

Original Paper
Published: 23 January 2009

Volume 11, pages 159–182, (2009)
Cite this article

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Hadi Grailu¹,
Mojtaba Lotfizad¹ &
Hadi Sadoghi-Yazdi^2,3

147 Accesses
2 Citations
Explore all metrics

Abstract

Pattern matching is the most widely used technique for the compression of printed bi-level text images. In some printed scripts, letters normally attach to each other, or some letters have a simple relation to each other, or there may be undesired touching characters. Detecting such situations and exploiting them to reduce the library size, has a rather great effect on the compression ratio. In this paper, a lossy/lossless compression method for printed typeset bi-level text images is proposed for archiving purposes. For this, three techniques are proposed. First, the number of library prototypes is reduced by detecting and exploiting the mentioned situations. Second, a new effective encoding scheme is proposed for patterns and numbers. Third, three levels are proposed for lossy compression. Experimental results show that the proposed method works better, as high as 1.4–3.3 times in lossy case and 1.2–2.7 times in lossless case at 300 dpi, than the best existing compression methods or standards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Size Optimization of Scanned Textual Documents

Introducing the Concept of Back-Inking as an Efficient Model for Document Retrieval (Image Reconstruction)

Pseudo-2D-matching based enhancement to high efficiency video coding for screen contents

Article 16 May 2014

References

Witten I.H., Bell T.C., Emberson H., Inglis S., Moffat A.: Textual image compression: two-stage lossy/lossless encoding of textual images. Proc. IEEE 82, 6 (1994)
Article Google Scholar
Gersho A., Gray R.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)
MATH Google Scholar
Jain A.: Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs (1989)
MATH Google Scholar
Barnsley M.F., Hurd L.P.: Fractal Image Compression. Peters, Wellesley (1993)
MATH Google Scholar
Ascher R.N., Nagy G.: A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23, 1174–1179 (1974)
Article MATH Google Scholar
Pratt W.K., Capitant P.J., Chen W.H., Hamilton E.R., Wallis R.H.: Combined symbol matching facsimile data compression system. Proc. IEEE 68(7), 786–796 (1980)
Article Google Scholar
Holt M.J.: A fast bi-level template matching algorithm for document image data compression. In: Kittler, J. (ed.) Pattern Recognition, pp. 230–239. Springer, Berlin (1988)
Carvalho M.B., Silva E.A.B., Finamore W.A.: Multidimensional signal compression using multiscale recurrent patterns. Signal Proc. 82, 1559–1580 (2002)
Article MATH Google Scholar
Kia O.E., Doermann D.S., Rosenfeld A., Chellappa R.: Symbolic compression and processing of document images. Comp. Vis. Image Understanding 70(3), 335–349 (1998)
Article Google Scholar
Kia O.E., Doermann D.S.: Residual coding in document image compression. IEEE Trans. Image Proc. 9(6), 961–969 (2000)
Article Google Scholar
Elias, P.: Universal codeword sets and representations of the inteers. IEEE Trans. Inform. Theor. 21,(2) (1975)
Moffat, A.: Two level context based compression of bi-level images. In: Proceedings IEEE Data Compression Conference, pp. 382–391 (1991)
Kanungo, T., Haralick, R.M., Phillips, I.T.: Global and local document degradation models. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 730–734 (1993)
Brickman N.F., Rosenbaum W.S.: Word autocorrelation redundancy match (WARM) technology. IBM J. Res. Devel. 26(6), 681–686 (1982)
Article Google Scholar
Johnsen O., Segen J., Cash G.L.: Coding of two-level pictures by pattern matching and substitution. Bell Syst. Tech. J 62(8), 2513–2545 (1983)
Google Scholar
Holt, M.J.J., Xydeas, C.S.: Recent developments in image data compression for digital facsimile. ICL Tech. J. 123–146 (1986)
Howard P.G., Kossentini F., Forchhammer S., Ruchlidge W.J.: The emerging JBIG2 standard. IEEE Trans. Circuits Sys. Video Tech. 8(7), 838–848 (1998)
Article Google Scholar
Haskel B.G. et al.: Image and video coding-emerging standards and beyond. IEEE Trans. Circuits Syst. Video Tech. 8(7), 814–837 (1998)
Article Google Scholar
Wei, S.D., Lai, S.H.: Efficient Normalized Cross Correlation Based on Adaptive Multilevel Successive Elimination. ACCV 2007, pp. 638–646 (2007)
Stefano, L.D., Mattoccia, S., Tombari, F.: An Algorithm for Efficient and Exhaustive Template Matching. ICIAR 2004, pp. 408–415 (2004)
Cirrincione, G., Cirrincione, M.: Neural Networks for Matching in Computer Vision. KES 2007/WIRN 200, pp. 688–695 (2007)
Marimon, D., Ebrahimi, T.: Efficient Rotation-Discriminative Template Matching. CIARP 2007, pp. 221–230 (2007)
Eggert, J., Zhang, C., Corner, E.: Template Matching for Large Transformations. ICANN 2007, pp. 169–179 (2007)
Song, J., Chen, B., Chi, Z., Qiu, X., Wang, W.: Face Recognition Based on Bi-level Template Matching. ICIC 2007, pp. 1131–1139 (2007)
Ye Y., Cosman P.: Fast and memory efficient text image compression with JBIG2. IEEE Trans. Image Proc. 12(8), 944–956 (2003)
Article Google Scholar
Howard P.G.: Text image compression using soft pattern matching. Comput. J. 40(2/3), 146–156 (1997)
Article Google Scholar
Ye Y., Cosman P.: Dictionary design for text image compression with JBIG2. IEEE Trans. Image Process. 10(6), 818–828 (2001)
Article MATH Google Scholar
Salomon D.: A concise introduction to data compression. Springer, London (2008)
MATH Google Scholar
Sayood, (K.) (eds): Lossless Compression Handbook. Academic Press, New York (2003)
Google Scholar
Salomon D.: Data Compression, The Complete Reference, 4th edn. Springer, London (2007)
MATH Google Scholar
Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Academic Press, New York (1999)
Google Scholar
Pu I.M.: Fundamental Data Compression. Butterworth- Heinemann, London (2006)
Google Scholar
Kok, C.W., Nguyen, T.Q.: Document image compression by sub-band system. In: IEEE International Symposium on Circuits and Systems, ISCAS ’96, vol. 2, pp. 688–691 (1996)
de Queiroz, R.L., Buckley, R., Xu, M.: Mixed raster content (MRC) model for compound image compression. SPIE Conf. Vis. Commun. Image Process. 3653, 1106–1117 (1999)
Google Scholar
Lam E.Y.: Compound document compression with model-based biased reconstruction. J. Electron. Imaging 13(1), 191–194 (2004)
Article Google Scholar
Cheng, H., Bouman, C.A.: Multilayer Document Compression Algorithm. IEEE ICIP Oct (1999)
Feng G., Bouman C.A.: High quality MRC document coding. IEEE Trans. Image Process. 15(10), 3152–3169 (2006)
Article Google Scholar
Haneda, E., Yi, J., Bouman, C.A.: Segmentation for MRC compression. In: Proceedings of SPIE/IS&T, vol. 6493 (2007)
Huttenlocher D., Felzenszwalb P., Rucklidge W.: Digipaper: a versatile color document image representation. IEEE ICIP 1, 219–223 (1999)
Google Scholar
Bottou L., Haffner P., Howard P.G., Simard P., Bengio Y., LeCun Y.: High quality document image compression with ‘DjVu’. J. Electron. Imaging 7, 410–425 (1998)
Article Google Scholar
Barthel, K., McPartlin, S., Thierschmann, M.: New technology for raster document image compression. In: SPIE Conf. on Document Recognition and Retrieval VII, vol. 3967, pp. 286–290 (2000)
Hankerson D., Harris G.A., Johnson P.D.: Introduction to Information Theory and Data Compression, 2nd edn. Chapman & Hall/CRC Press, London (2003)
MATH Google Scholar
JBIG, Progressive Bi-Level Image Compression, ISO/IEC International Standard 11544, ITU-T Recommendation T.82 (1993)
Moffat, A.: Two-level context based compression of bi-level images. In: Storer, J.A., Reif, J.H. (eds.) In: Proceedings IEEE Data Compression Confernce. IEEE Computer Society Press, Los Alamitos, pp. 382–391 (1991)
Bell T.C., Cleary J.G., Witten I.H.: Text Compression. Prentice Hall, Englewood Cliffs (1990)
Google Scholar
Yang Y., Yan H., Yu D.: Content-lossless document image compression based on structural analysis and pattern matching. Patt. Recogn. 33, 1277–1293 (2000)
Article Google Scholar
Jahne B.: Practical Handbook on Image Processing for Scientific and Technical Applications, 2nd edn. CRC Press, New york (2004)
Google Scholar
Inglis, S., Witten, I.H.: Compression-Based Template Matching. Data Compression Conference, DCC ’94 Proceedings, pp. 106–115 (1994)
Zhang, Q., Danskin, J.M.: A pattern-based lossy compression scheme for document images. Electronic Publishing 8, (2, 3), 221–233
Ono, F., Howard, P.G., Fernandes, D.: The Emerging JBIG-2 Standard. ISO/IEC JTC 1/SC 29/WG1 (ITU-T SG8), N1397, July (1997)
Chen S., Yan H., Xu Z.: Compression of Chinese document images based on morphological analysis and pattern matching. Opt. Eng. 45(10), 107001 (2006)
Article Google Scholar
Gerek O.N., Cetin A.E., Tewfik A.H.: Wavelet domain textual coding of ottoman script images. Proc. SPIE 2727, 568–578 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Engineering Department, Tarbiat Modarres University, Tehran, Iran
Hadi Grailu & Mojtaba Lotfizad
Engineering Department, Tarbiat Moallem University of Sabzevar, Sabzavar, Iran
Hadi Sadoghi-Yazdi
Computer Department, Ferdowsi University of Mashad, Mashad, Iran
Hadi Sadoghi-Yazdi

Authors

Hadi Grailu
View author publications
You can also search for this author in PubMed Google Scholar
Mojtaba Lotfizad
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Sadoghi-Yazdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mojtaba Lotfizad.

Additional information

This work was supported by the Iranian Telecommunication Research Center (ITRC).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grailu, H., Lotfizad, M. & Sadoghi-Yazdi, H. A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching. IJDAR 11, 159–182 (2009). https://doi.org/10.1007/s10032-008-0075-3

Download citation

Received: 03 March 2008
Revised: 16 August 2008
Accepted: 29 October 2008
Published: 23 January 2009
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10032-008-0075-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching

Abstract

Access this article

Similar content being viewed by others

A Study on Size Optimization of Scanned Textual Documents

Introducing the Concept of Back-Inking as an Efficient Model for Document Retrieval (Image Reconstruction)

Pseudo-2D-matching based enhancement to high efficiency video coding for screen contents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching

Abstract

Access this article

Similar content being viewed by others

A Study on Size Optimization of Scanned Textual Documents

Introducing the Concept of Back-Inking as an Efficient Model for Document Retrieval (Image Reconstruction)

Pseudo-2D-matching based enhancement to high efficiency video coding for screen contents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation