On foreground — background separation in low quality document images

Garain, Utpal; Paquet, Thierry; Heutte, Laurent

doi:10.1007/s10032-005-0007-4

On foreground — background separation in low quality document images

Original Paper
Published: 10 February 2006

Volume 8, pages 47–63, (2006)
Cite this article

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Utpal Garain¹,
Thierry Paquet² &
Laurent Heutte²

209 Accesses
23 Citations
3 Altmetric
Explore all metrics

Abstract

This paper deals with effective separation of foreground and background in low quality document images suffering from various types of degradations including scanning noise, aging effects, uneven background, or foreground, etc. The proposed algorithm shows an excellent adaptability to tackle with these problems of uneven illumination and local changes or nonuniformity in background and foreground colors. The approach is primarily designed for (not restricted to) processing of color documents but it works well in the gray scale domain too. Test document set considers samples (in color as well as in gray scale) of old historical documents including manuscripts of high importance. The data set used in this study consists of hundred images. These images are selected from different sources including image databases that had been scanned from working notebooks of famous writers who used to write with quill or pencil generating very low contrast between foreground and background. Evaluation of foreground extraction method has been judged by computing the accuracy of extracting handwritten lines and words from the test images. This evaluation shows that the proposed method can extract lines and words with accuracies of about 84% and 93%, respectively. Apart from this quantitative method, a qualitative evaluation is also presented to compare the proposed method with one popular technique for foreground/background separation in document images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Foreground-Background Separation Method for Effective Binarization of Document Images

Efficient adaptive thresholding algorithm for in-homogeneous document background removal

Article 25 November 2014

Degraded Document Image Binarization Using Active Contour Model

References

Sahoo, P.K., Soltani, S., Wong, A.K.C., Chen, Y.C.: A survey of thresholding techniques. Comput. Vision Graphics Image Process., 41, 233–260 (1988)
Article Google Scholar
Trier, O.D., Taxt, T.: Evaluation of binarization methods for document images. IEEE Trans. Pattern Anal. Machine Intell. 17, 312–315 (1995)
Article Google Scholar
Trier, O.D., Jain, A.K.: Goal-directed evaluation of binarization methods. IEEE Trans. Pattern Anal. Machine Intell. 17, 1191–1201 (1195)
Article Google Scholar
Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recog. 33, 225–236 (2000)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)
Article Google Scholar
Rosenfeld, A., Kak, A.C.: Digital Picture Processing. 2nd ed. Academic Press, New York (1982)
Google Scholar
Kittler, J., Illingworth, J.: Threshold selection based on a simple image statistic. Comput. Vision Graphics Image Process. CVGIP 30, 125–147 (1985)
Google Scholar
Tsai, W.-H.: Moment-preserving thresholding: a new approach. CVGIP: Graphical Models Image Process. 29, 377–393 (1985)
Google Scholar
Niblack, W.: An Introduction to Digital Image Processing. pp. 115-116. Prentice-Hall (1986)
O'Gorman, L.: Binarization and multi-thresholding of document images using connectivity. CVGIP: Graphical Models Image Process. 56, 494–506 (1994)
Article Google Scholar
Liu, Y., Srihari, S.N.: Document image binarization based on texture features. IEEE Transa. Pattern Anal. and Machine Intell. 19(5), 540–544 (1997)
Article Google Scholar
Gatos, B., Pratikakis, I., Perantonis, S.J.: An Adaptive Binarization Technique for Low Quality Historical Documents. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163, pp. 102–113. Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany (2004)
Wang, Q., Xia, T., Li, L., Tan, C.L.: Document Image Enhancement Using Directional Wavelet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wisconsin, USA (2003)
Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.: Color image segmentation: advances and prospects. Pattern Recog. 34, 2259–2281 (2001)
Article MATH Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley Publishing Company (2002)
Mariano, V.Y., Kasturi, R.: Locating uniform-colored text in video frames. Proc. Int. Conf. Pattern Recog. 4, 539–542 (2000)
Google Scholar
Lopresti, D., Zhou, J.: Locating and Recognizing Text in WWW Images. Inf. Retriev. 2, 177–206 (2000)
Article Google Scholar
Perroud, T., Sobottka, K., Bunke, H.: Text extraction from Color Documents- Clustering Approaches in Three and Four Dimensions. In: 6th International Conference on Document Analysis and Recognition (ICDAR), pp. 937–941 Seattle, USA (2001)
Tsai, C.M., Lee, H.J.: Binarization of color document images via luminance and saturation color features. IEEE Trans. Image Process. 11(4), 434–451 (2002)
Article Google Scholar
Wang, K., Kangas, J.A.: Character Location in scene images from digital camera. Pattern Recog. 36, 2287–2299 (2003)
Article MATH Google Scholar
Li, Y., Wang, Z., Zeng, H.: String Extraction From Color Airline Coupon Image Using Statistical Approach. In: 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 289–293 Edinburgh, Scotland (2003)
Nishida, H., Suzuki, T.: A Multiscale Approach to Restoring Scanned Color Document Images with Show-Through Effects. In: 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 584–588 Edinburgh, Scotland (2003)
Loo, P.K. Tan, C.L.: Adaptive Region Growing Color Segmentation for Text Using Irregular Pyramid. In: 6th International Workshop on Document Analysis Systems (DAS), Florence, Italy, vol. 3163, pp. 264–275 Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany (2004)
He, J., Downton, A.C. Colour Map Classification for Archive Documents. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163 pp. 241–251. Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany (2004)
Leydier, Y., Bourgeois, F.L., Emptoz, H.: Serialized k-Means for Adaptative Color Image Segmentation- Application to Document Images and Others. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163 pp. 252–263. Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany (2004)
Yan, C., Leedham, G.: Decompose-Threshold Approach to Handwriting Extraction in Degraded Historical Document Images. In: 9th Int. Workshop on Frontiers in Handwriting Recognition (IWFHR), Kokubunji, Tokyo, Japan (2004)
Bottou, L., Haffner, P., Howard, P.G.: High Quality Document Image Compression with DjVu. J. Electron. Imag., 410–425, SPIE (1998)
Antonacopoulos, A., Karatzas, D.: A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163, pp. 90–101. Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany, (2004)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Neyman, J., LeCam, L.M. (eds.), Proceedings of the 5th Berkeley Symposium on Mathematics, Statistics, and Probabilities, vol. 1, pp. 281–297, Berkeley and Los Angeles, University of California Press (1967)
Document Express, DjVu (ed.) (Pro) 4.1.0. Copyright (c) 2000–2003, LizardTech, Inc.
Smigiel, E., Belaid, A., Hamza, H.: Self-Organizing Maps and Ancient Documents. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163 pp. 125–134 Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany (2004)
Tonazzini, A., Salerno, E., Mochi, M., Bedini, L.: Bleed-Through Removal from Degraded documents Using a Color Decorrelation Method. In: 6th International Workshop on Document Analysis Systems (DAS), vol. 3163 pp. 229–240 Florence, Italy, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Germany, (2004)

Download references

Author information

Authors and Affiliations

Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road, 700108, Kolkata, INDIA
Utpal Garain
Laboratoire PSI - FRE CNRS 2645, UFR des Sciences, University of Rouen, 76821, Mont Saint Aignan cedex, FRANCE
Thierry Paquet & Laurent Heutte

Authors

Utpal Garain
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Paquet
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Heutte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Utpal Garain.

Additional information

Utpal Garain received both of his B.E. and M.E. in Computer Science and Engineering from Jadavpur University, Kolkata in 1994 and 1997, respectively and PhD from Indian Statistical Institute, Kolkata in 2005. Mr. Garain started his career as a software professional in industry and later on joined as a research personnel at the Indian Statistical Institute, where he is currently a full-time faculty member. He is one of the key scientists involved in the development of a bilingual (Devanagri & Bangla) OCR system, the first of its kind in India. Mr. Garain's areas of interest are in digital document processing including optical character recognition for Indian language scripts, online character recognition, document data compression, artificial immune system, etc.

Thierry Paquet Thierry PAQUET received the Ph.D. degree from the University de Rouen in 1992 in the field of Pattern Recognition. From 1992 to 2002 he has been appointed as a Senior Lecturer at the University of Rouen where he taught Signal and Image Processing. From 1992 to 1996 he was involved in an industrial collaboration with Matra MCS and the French Postal Research Center (SRTP) for the automatisation of mail sorting and bank checks reading. During this period he also worked on stochastic models and Information Criteria. Thierry PAQUET was appointed as a full professor in 2002 at the University of Rouen. His current research area concern Handwritten Document processing including Biometry, Writer adaptation of recognition systems, handwritten document categorization for industrial purposes, complex layout analysis for historical document analysis. Thierry PAQUET is the president of the French association for research in written communication (GRCE).

Laurent Heutte Laurent Heutte (30/05/1964) received his Ph.D. degree in Computer Engineering from the University of Rouen, France, in 1994. From 1996 to 2004, he was a Senior Lecturer in Computer Engineering and Automatic Control at the University of Rouen. Since 2004, he has been a Professor in the same university. Professor Heutte's present research interests are multiple classifier systems, off-line cursive handwriting analysis and recognition, handwritten document layout analysis and information extraction from handwritten documents. Since 2003, he is an Associate Editor of Pattern Recognition journal and the representative member of the French association of pattern recognition (AFRIF) in the Governing Board of the IAPR.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garain, U., Paquet, T. & Heutte, L. On foreground — background separation in low quality document images. IJDAR 8, 47–63 (2006). https://doi.org/10.1007/s10032-005-0007-4

Download citation

Received: 02 December 2004
Revised: 01 August 2005
Accepted: 10 October 2005
Published: 10 February 2006
Issue Date: April 2006
DOI: https://doi.org/10.1007/s10032-005-0007-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On foreground — background separation in low quality document images

Abstract

Access this article

Similar content being viewed by others

An Adaptive Foreground-Background Separation Method for Effective Binarization of Document Images

Efficient adaptive thresholding algorithm for in-homogeneous document background removal

Degraded Document Image Binarization Using Active Contour Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On foreground — background separation in low quality document images

Abstract

Access this article

Similar content being viewed by others

An Adaptive Foreground-Background Separation Method for Effective Binarization of Document Images

Efficient adaptive thresholding algorithm for in-homogeneous document background removal

Degraded Document Image Binarization Using Active Contour Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation