Skip to main content

CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization

  • Conference paper
  • First Online:
PRICAI 2023: Trends in Artificial Intelligence (PRICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14325))

Included in the following conference series:

Abstract

To efficiently extract textual information from color degraded document images is a significant research area. The prolonged imperfect preservation of ancient documents has led to various types of degradation, such as page staining, paper yellowing, and ink bleeding. These types of degradation badly impact the image processing for features extraction. This paper introduces a novelty method employing generative adversarial networks based on color channel using discrete wavelet transform (CCDWT-GAN). The proposed method involves three stages: image preprocessing, image enhancement, and image binarization. In the initial step, we apply discrete wavelet transform (DWT) to retain the low-low (LL) subband image, thereby enhancing image quality. Subsequently, we divide the original input image into four single-channel colors (red, green, blue, and gray) to separately train adversarial networks. For the extraction of global and local features, we utilize the output image from the image enhancement stage and the entire input image to train adversarial networks independently, and then combine these two results as the final output. To validate the positive impact of the image enhancement and binarization stages on model performance, we conduct an ablation study. This work compares the performance of the proposed method with other state-of-the-art (SOTA) methods on DIBCO and H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets. Notably, on DIBCO 2013 and 2016 dataset, our method achieves F-measure (FM) values of 95.24 and 91.46, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bartusiak, E.R., et al.: Splicing detection and localization in satellite imagery using conditional GANs. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 91–96. IEEE (2019)

    Google Scholar 

  2. Bera, S.K., Ghosh, S., Bhowmik, S., Sarkar, R., Nasipuri, M.: A non-parametric binarization method based on ensemble of clustering algorithms. Multimed. Tools Appl. 80(5), 7653–7673 (2021)

    Article  Google Scholar 

  3. Bhunia, A.K., Bhunia, A.K., Sain, A., Roy, P.P.: Improving document binarization via adversarial noise-texture augmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2721–2725. IEEE (2019)

    Google Scholar 

  4. De, R., Chakraborty, A., Sarkar, R.: Document image binarization using dual discriminator generative adversarial networks. IEEE Signal Process. Lett. 27, 1090–1094 (2020)

    Article  Google Scholar 

  5. Deng, F., Wu, Z., Lu, Z., Brown, M.S.: Binarizationshop: a user-assisted software suite for converting old documents to black-and-white. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 255–258 (2010)

    Google Scholar 

  6. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1375–1382. IEEE (2009)

    Google Scholar 

  7. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  8. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  9. Guo, J., He, C., Zhang, X.: Nonlinear edge-preserving diffusion with adaptive source for document images binarization. Appl. Math. Comput. 351, 8–22 (2019)

    MathSciNet  MATH  Google Scholar 

  10. He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019)

    Article  Google Scholar 

  11. Hedjam, R., Cheriet, M.: Historical document image restoration using multispectral imaging system. Pattern Recogn. 46(8), 2297–2312 (2013)

    Article  Google Scholar 

  12. Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. (IJDAR) 16, 247–258 (2013)

    Article  Google Scholar 

  13. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

    Google Scholar 

  14. Jemni, S.K., Souibgui, M.A., Kessentini, Y., Fornés, A.: Enhance to read better: a multi-task adversarial network for handwritten document image enhancement. Pattern Recogn. 123, 108370 (2022)

    Article  Google Scholar 

  15. Jia, F., Shi, C., He, K., Wang, C., Xiao, B.: Degraded document image binarization using structural symmetry of strokes. Pattern Recogn. 74, 225–240 (2018)

    Article  Google Scholar 

  16. Kligler, N., Katz, S., Tal, A.: Document enhancement using visibility detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2374–2382 (2018)

    Google Scholar 

  17. Nafchi, H.Z., Ayatollahi, S.M., Moghaddam, R.F., Cheriet, M.: An efficient ground truthing tool for binarization of historical manuscripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 807–811. IEEE (2013)

    Google Scholar 

  18. Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)

    Google Scholar 

  19. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  20. Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010-handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 727–732. IEEE (2010)

    Google Scholar 

  21. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: 2011 International Conference on Document Analysis and Recognition, pp. 1506–1510. IEEE (2011)

    Google Scholar 

  22. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012). In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 817–822. IEEE (2012)

    Google Scholar 

  23. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1471–1476. IEEE (2013)

    Google Scholar 

  24. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICFHR 2016 handwritten document image binarization contest (H-DIBCO 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623. IEEE (2016)

    Google Scholar 

  25. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR 2017 competition on document image binarization (DIBCO 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1395–1403. IEEE (2017)

    Google Scholar 

  26. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  27. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)

    Article  Google Scholar 

  28. Suh, S., Kim, J., Lukowicz, P., Lee, Y.O.: Two-stage generative adversarial networks for binarization of color document images. Pattern Recogn. 130, 108810 (2022)

    Article  Google Scholar 

  29. Suh, S., Lee, H., Lukowicz, P., Lee, Y.O.: CEGAN: classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw. 133, 69–86 (2021)

    Article  Google Scholar 

  30. Sulaiman, A., Omar, K., Nasrudin, M.F.: Degraded historical document binarization: a review on issues, challenges, techniques, and future directions. J. Imaging 5(4), 48 (2019)

    Article  Google Scholar 

  31. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)

    Google Scholar 

  32. Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 99–104. IEEE (2017)

    Google Scholar 

  33. Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.: Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recogn. 74, 568–586 (2018)

    Article  Google Scholar 

  34. Yang, Z., Xiong, Y., Wu, G.: GDB: gated convolutions-based document binarization. arXiv preprint arXiv:2302.02073 (2023)

  35. Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn. 96, 106968 (2019)

    Article  Google Scholar 

  36. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by National Science and Technology Council of Taiwan, under Grant Number: NSTC 112-2221-E-032-037-MY2.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rui-Yang Ju or Jen-Shiun Chiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ju, RY., Lin, YS., Chiang, JS., Chen, CC., Chen, WH., Chien, CT. (2024). CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7019-3_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7018-6

  • Online ISBN: 978-981-99-7019-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics