Abstract
Pattern matching is the most widely used technique for the compression of printed bi-level text images. In some printed scripts, letters normally attach to each other, or some letters have a simple relation to each other, or there may be undesired touching characters. Detecting such situations and exploiting them to reduce the library size, has a rather great effect on the compression ratio. In this paper, a lossy/lossless compression method for printed typeset bi-level text images is proposed for archiving purposes. For this, three techniques are proposed. First, the number of library prototypes is reduced by detecting and exploiting the mentioned situations. Second, a new effective encoding scheme is proposed for patterns and numbers. Third, three levels are proposed for lossy compression. Experimental results show that the proposed method works better, as high as 1.4–3.3 times in lossy case and 1.2–2.7 times in lossless case at 300 dpi, than the best existing compression methods or standards.
Similar content being viewed by others
References
Witten I.H., Bell T.C., Emberson H., Inglis S., Moffat A.: Textual image compression: two-stage lossy/lossless encoding of textual images. Proc. IEEE 82, 6 (1994)
Gersho A., Gray R.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)
Jain A.: Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs (1989)
Barnsley M.F., Hurd L.P.: Fractal Image Compression. Peters, Wellesley (1993)
Ascher R.N., Nagy G.: A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23, 1174–1179 (1974)
Pratt W.K., Capitant P.J., Chen W.H., Hamilton E.R., Wallis R.H.: Combined symbol matching facsimile data compression system. Proc. IEEE 68(7), 786–796 (1980)
Holt M.J.: A fast bi-level template matching algorithm for document image data compression. In: Kittler, J. (ed.) Pattern Recognition, pp. 230–239. Springer, Berlin (1988)
Carvalho M.B., Silva E.A.B., Finamore W.A.: Multidimensional signal compression using multiscale recurrent patterns. Signal Proc. 82, 1559–1580 (2002)
Kia O.E., Doermann D.S., Rosenfeld A., Chellappa R.: Symbolic compression and processing of document images. Comp. Vis. Image Understanding 70(3), 335–349 (1998)
Kia O.E., Doermann D.S.: Residual coding in document image compression. IEEE Trans. Image Proc. 9(6), 961–969 (2000)
Elias, P.: Universal codeword sets and representations of the inteers. IEEE Trans. Inform. Theor. 21,(2) (1975)
Moffat, A.: Two level context based compression of bi-level images. In: Proceedings IEEE Data Compression Conference, pp. 382–391 (1991)
Kanungo, T., Haralick, R.M., Phillips, I.T.: Global and local document degradation models. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 730–734 (1993)
Brickman N.F., Rosenbaum W.S.: Word autocorrelation redundancy match (WARM) technology. IBM J. Res. Devel. 26(6), 681–686 (1982)
Johnsen O., Segen J., Cash G.L.: Coding of two-level pictures by pattern matching and substitution. Bell Syst. Tech. J 62(8), 2513–2545 (1983)
Holt, M.J.J., Xydeas, C.S.: Recent developments in image data compression for digital facsimile. ICL Tech. J. 123–146 (1986)
Howard P.G., Kossentini F., Forchhammer S., Ruchlidge W.J.: The emerging JBIG2 standard. IEEE Trans. Circuits Sys. Video Tech. 8(7), 838–848 (1998)
Haskel B.G. et al.: Image and video coding-emerging standards and beyond. IEEE Trans. Circuits Syst. Video Tech. 8(7), 814–837 (1998)
Wei, S.D., Lai, S.H.: Efficient Normalized Cross Correlation Based on Adaptive Multilevel Successive Elimination. ACCV 2007, pp. 638–646 (2007)
Stefano, L.D., Mattoccia, S., Tombari, F.: An Algorithm for Efficient and Exhaustive Template Matching. ICIAR 2004, pp. 408–415 (2004)
Cirrincione, G., Cirrincione, M.: Neural Networks for Matching in Computer Vision. KES 2007/WIRN 200, pp. 688–695 (2007)
Marimon, D., Ebrahimi, T.: Efficient Rotation-Discriminative Template Matching. CIARP 2007, pp. 221–230 (2007)
Eggert, J., Zhang, C., Corner, E.: Template Matching for Large Transformations. ICANN 2007, pp. 169–179 (2007)
Song, J., Chen, B., Chi, Z., Qiu, X., Wang, W.: Face Recognition Based on Bi-level Template Matching. ICIC 2007, pp. 1131–1139 (2007)
Ye Y., Cosman P.: Fast and memory efficient text image compression with JBIG2. IEEE Trans. Image Proc. 12(8), 944–956 (2003)
Howard P.G.: Text image compression using soft pattern matching. Comput. J. 40(2/3), 146–156 (1997)
Ye Y., Cosman P.: Dictionary design for text image compression with JBIG2. IEEE Trans. Image Process. 10(6), 818–828 (2001)
Salomon D.: A concise introduction to data compression. Springer, London (2008)
Sayood, (K.) (eds): Lossless Compression Handbook. Academic Press, New York (2003)
Salomon D.: Data Compression, The Complete Reference, 4th edn. Springer, London (2007)
Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Academic Press, New York (1999)
Pu I.M.: Fundamental Data Compression. Butterworth- Heinemann, London (2006)
Kok, C.W., Nguyen, T.Q.: Document image compression by sub-band system. In: IEEE International Symposium on Circuits and Systems, ISCAS ’96, vol. 2, pp. 688–691 (1996)
de Queiroz, R.L., Buckley, R., Xu, M.: Mixed raster content (MRC) model for compound image compression. SPIE Conf. Vis. Commun. Image Process. 3653, 1106–1117 (1999)
Lam E.Y.: Compound document compression with model-based biased reconstruction. J. Electron. Imaging 13(1), 191–194 (2004)
Cheng, H., Bouman, C.A.: Multilayer Document Compression Algorithm. IEEE ICIP Oct (1999)
Feng G., Bouman C.A.: High quality MRC document coding. IEEE Trans. Image Process. 15(10), 3152–3169 (2006)
Haneda, E., Yi, J., Bouman, C.A.: Segmentation for MRC compression. In: Proceedings of SPIE/IS&T, vol. 6493 (2007)
Huttenlocher D., Felzenszwalb P., Rucklidge W.: Digipaper: a versatile color document image representation. IEEE ICIP 1, 219–223 (1999)
Bottou L., Haffner P., Howard P.G., Simard P., Bengio Y., LeCun Y.: High quality document image compression with ‘DjVu’. J. Electron. Imaging 7, 410–425 (1998)
Barthel, K., McPartlin, S., Thierschmann, M.: New technology for raster document image compression. In: SPIE Conf. on Document Recognition and Retrieval VII, vol. 3967, pp. 286–290 (2000)
Hankerson D., Harris G.A., Johnson P.D.: Introduction to Information Theory and Data Compression, 2nd edn. Chapman & Hall/CRC Press, London (2003)
JBIG, Progressive Bi-Level Image Compression, ISO/IEC International Standard 11544, ITU-T Recommendation T.82 (1993)
Moffat, A.: Two-level context based compression of bi-level images. In: Storer, J.A., Reif, J.H. (eds.) In: Proceedings IEEE Data Compression Confernce. IEEE Computer Society Press, Los Alamitos, pp. 382–391 (1991)
Bell T.C., Cleary J.G., Witten I.H.: Text Compression. Prentice Hall, Englewood Cliffs (1990)
Yang Y., Yan H., Yu D.: Content-lossless document image compression based on structural analysis and pattern matching. Patt. Recogn. 33, 1277–1293 (2000)
Jahne B.: Practical Handbook on Image Processing for Scientific and Technical Applications, 2nd edn. CRC Press, New york (2004)
Inglis, S., Witten, I.H.: Compression-Based Template Matching. Data Compression Conference, DCC ’94 Proceedings, pp. 106–115 (1994)
Zhang, Q., Danskin, J.M.: A pattern-based lossy compression scheme for document images. Electronic Publishing 8, (2, 3), 221–233
Ono, F., Howard, P.G., Fernandes, D.: The Emerging JBIG-2 Standard. ISO/IEC JTC 1/SC 29/WG1 (ITU-T SG8), N1397, July (1997)
Chen S., Yan H., Xu Z.: Compression of Chinese document images based on morphological analysis and pattern matching. Opt. Eng. 45(10), 107001 (2006)
Gerek O.N., Cetin A.E., Tewfik A.H.: Wavelet domain textual coding of ottoman script images. Proc. SPIE 2727, 568–578 (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Iranian Telecommunication Research Center (ITRC).
Rights and permissions
About this article
Cite this article
Grailu, H., Lotfizad, M. & Sadoghi-Yazdi, H. A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching. IJDAR 11, 159–182 (2009). https://doi.org/10.1007/s10032-008-0075-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-008-0075-3