Skip to main content
Log in

Blind bleed-through removal in color ancient manuscripts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Archaic manuscripts are an important part of ancient civilization. Unfortunately, such documents are often affected by various age related degradations, which impinge their legibility and information contents, and destroy their original look. In general, these documents are composed of three layers of information: foreground text, background, and unwanted degradation in the form of patterns interfering with the main text. In this work, we are presenting a color space based image segmentation technique to separate and remove the bleed-through degradation in digital ancient manuscripts. The main theme is to improve their readability and restore their original aesthetic look. For each pixel, a feature vector is created using color spectral and spatial location information. A pixel based segmentation method using Gaussian Mixture Model (GMM) is employed, assuming that each feature vector corresponds to a Gaussian distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distribution, with unknown parameters. The Expectation-Maximization (EM) approach is then used to estimate the unknown GMM parameters. The appropriate class label for each pixel is then estimated using posterior probability and GMM parameters. Unlike other binarization based document restoration method where the focus is on text extraction, we are more interested in restoring the aesthetically pleasing look of the ancient documents.The experimental results validate the usefulness of proposed method in terms of successful bleed-through identification and removal, while preserving foreground-text and background information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The data used in the experimental section of this paper is publicly available on https://www.isos.dias.ie/.

Notes

  1. http://www.isos.dias.ie

References

  1. Alata O, Quintard L (2009) Is there a best color space for color image characterization or representation based on multivariate gaussian mixture model? Comput Vis Image Underst 113:867–877

    Article  Google Scholar 

  2. Blekas K, Likas A, Galatsanos N, Lagaris I (2005) A spatially constrained mixture model for image segmentation. IEEE Trans Neural Netw 16:494–498

    Article  Google Scholar 

  3. Busin L, Vandenbroucke N, Macaire L (2008) Color spaces and image segmentation. Adv Imaging Electron Phys 151:65–168

    Article  Google Scholar 

  4. Cai X, Chan R, Nikolova M, Zeng T (2017) A three stage approach for segmenting degraded color images: smoothing, lifting and thresholding (slat). J Sci Comput 72:1313–1332

    Article  MathSciNet  MATH  Google Scholar 

  5. Cappe E, Moulines O (2009) On-line expectation-maximization algorithm for latent data models. J Roy Stat Soc 71:593–613

    Article  MathSciNet  MATH  Google Scholar 

  6. Chaves-González J M, Vega-Rodríguez M A, Gómez-Pulido J A, Sánchez-Pérez J M (2010) Detecting skin in face recognition systems: a colour spaces study. Digit Signal Process 20:806–823

    Article  Google Scholar 

  7. Cheng HD, Jiang XH, Sun Y, Xan J (2001) Color image segmentation: advances and prospects. Pattern Recogn 34:2259–2281

    Article  MATH  Google Scholar 

  8. Drira F, Bourgeois F L, Emptoz H (2006) Restoring ink bleed-through degraded document images using a recursive unsupervised classification technique. Proc DAS, 38–49

  9. Fadoua D, Bourgeois F L, Emptoz H (2006) Restoring ink bleed-through degraded document images using a recursive unsupervised classification technique. Document Analysis Systems VII, Lecture Notes in Computer Science, vol 3872 Springer 3872:27–38

    Google Scholar 

  10. Galerne B, Leclaire A (2017) Texture inpainting using efficient gaussian conditional simulation. SIAM J Imag Sci 10:1446–1474

    Article  MathSciNet  MATH  Google Scholar 

  11. GD V, C P (2018) Document binarization via multi-resolutional attention model with DRD loss. Pattern Recogn 81:224–239

    Google Scholar 

  12. Hanif M, Tonazzini A, Savino P, Salerno E (2018) Non-local sparse image inpainting for document bleed-through removal. J Imag 4:68

    Article  Google Scholar 

  13. J Z, C S, F J, Y W, B X (2019) Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn, 96

  14. Jurio A, Pagola M, Galar M, Lopez-Molina C, Paternain D (2010) A comparison study of different color spaces in clustering based image segmentation. Inform Process Manag Uncertain Knowl-Based Syst 81:532–541

    MATH  Google Scholar 

  15. Leedham G, Varma S, Patankar A, Govindaraju V (2002) Separating text and background in degraded document images a comparison of global thresholding techniques for multi-stage thresholding. IEEE Trans Neural Netw, 244–249

  16. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Computer vision and pattern recognition (CVPR), 3431–3440

  17. Moghaddam R F, Cheriet M (2009) Low quality document image modeling and enhancement. Int J Doc Anal Recogn 11:183–201

    Article  Google Scholar 

  18. Moghaddam R F, Cheriet M (2010) A variational approach to degraded document enhancement. IEEE Trans Pattern Anal Mach Intell 38:1347–1361

    Article  Google Scholar 

  19. Orchard MT, Bouman CA (1991) Color quantization of images. IEEE Trans on Signal Process 39:2677–2698

    Article  Google Scholar 

  20. Park SH, Yun ID, Lee SU (1998) Color image segmentation based on 3d clustering morphological approach. Pattern Recogn 31:1061–1076

    Article  Google Scholar 

  21. Pastor-Pellicer J, Espa na Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015) Insights on the use of convolutional neural networks for document image binarization. International Work-conference on Artificial Neural Networks, Springer 1:115–126

    Google Scholar 

  22. Rani N S, Nair B J B, Chandrajith M, Kumar G H, Fortuny J (2022) Restoration of deteriorated text sections in ancient document images using a tri level semi-adaptive thresholding technique. Automatika 63:378–398. https://doi.org/10.1080/00051144.2022.2042462

    Article  Google Scholar 

  23. Rotaru C, Graf T, Zhang J (2008) Color image segmentation in hsi space for automotive applications. J Real-Time Image Proc, 3

  24. Rowley-Brooke R, Pitié F, Kokaram A C (2012) A ground truth bleed-through document image database. In: P Z, Buchanan G, Rasmussen E, Loizides F (eds) Theory and practice of digital libraries. LNCS, vol 7489. Springer, pp 185–196

  25. Rowley-Brooke R, Pitié F, Kokaram A C (2013) A non-parametric framework for document bleed-through removal. Proc CVPR, 2954–2960

  26. Ruiz-Ruiz G, Gómez-Gil J, Gracia L M N (2009) Testing different color spaces based on hue for the environmentally adaptive segmentation algorithm easa. Comput Electron Agric 68:88–96

    Article  Google Scholar 

  27. Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. Proc Int Conf Pattern Recogn, 473–476

  28. Sun B, Li S, Zhang X-P, Sun J (2016) Blind bleed-through removal for scanned historical document image with conditional random fields. IEEE Trans Image Process, 5702–5712

  29. Tensmeyer C, Martinez T (2020) Historical document image binarization: a review. SN Comput Sci 1:05

    Article  Google Scholar 

  30. Tonazzini A, Bedini L, Salerno E (2004) Independent component analysis for document restoration. Int J Doc Anal Recogn 7:17–27

    Article  Google Scholar 

  31. Tonazzini A, Bedini L, Salerno E (2006) A markov model for blind image separation by a mean-field em algorithm. IEEE Trans Image Process, 473–482

  32. Tonazzini A, Gerace I, Martinelli F (2010) Multichannel blind separation and deconvolution of images for document analysis. IEEE Trans Image Process 19:912–925

    Article  MathSciNet  MATH  Google Scholar 

  33. Tonazzini A, Salerno E, Bedini L (2007) Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. Int J Doc Anal Recogn 10:17–27

    Article  Google Scholar 

  34. Vandenbroucke N, Macaire L, Postaire J-G (2003) Color image segmentation by pixel classification in an adapted hybrid color space Table 1. application to soccer image analysis. Comput Vis Image Underst 90:190–216

    Article  Google Scholar 

  35. Wolf C (2010) Document ink bleed-through removal with two hidden markov random fields and a single observation field. IEEE Trans Pattern Anal Mach Intell, 431–447

  36. X P, C W, H C (2019) Document binarization via multi-resolutional attention model with DRD loss. IEEE International conference on document analysis and recognition (ICDAR), 45–50

  37. Yi H, Brown M S, Dong X (2010) User-assisted ink-bleed reduction. IEEE Trans Image Process 19:2646–2658

    Article  MathSciNet  MATH  Google Scholar 

  38. Zhang X, He C, Guo J (2020) Selective diffusion involving reaction for binarization of bleed-through document images. Appl Math Model 81:844–854

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

No funds or grants were received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Hanif.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by ERCIM.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hanif, M., Tonazzini, A., Hussain, S.F. et al. Blind bleed-through removal in color ancient manuscripts. Multimed Tools Appl 82, 12321–12335 (2023). https://doi.org/10.1007/s11042-022-13755-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13755-6

Keywords

Navigation