Skip to main content
Log in

Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images

  • S.I.: IoT-based Health Monitoring System
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Binarization, one of the most popular research directions in computer vision, is still facing challenges, especially for the degraded historical Tibetan document images. Quite a few U-Net-based binarization approaches might encounter a particular problem called pseudo-touching which hampers subsequent procedures including text line segmentation, character segmentation, and recognition. To avoid these undesired pseudo-touching strokes and obtain optimal binary images, the present work employs several easy-to-use techniques, such as rescaling the input and output of the attention U-Net. Furthermore, we provide insights into the accelerated construction of the training set and discuss the effects of various configurations. The quantitative experimental results on our dataset show that upsampling the input image by a factor of two during the inference phase can alleviate the pseudo-touching. It achieves an average P-FM of 97.73 which is two percentage points higher than the result of U-Net. The proposed approach can also accept common challenges including non-uniform illumination, stains, noise and delivers finer performance across several metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Code availability

Related Code can be found on the author’s GitHub repositories: https://github.com/ssocean/Attention-U-Net

References

  1. Bhowmik S, Kundu S, Sarkar R (2020) BINYAS: a complex document layout analysis system. Multimed Tools Appl 80:8471–8504. https://doi.org/10.1007/s11042-020-09832-3

    Article  Google Scholar 

  2. Vo QN, Kim SH, Yang HJ, Lee GS (2018) Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process 12(3):438–446. https://doi.org/10.1049/iet-ipr.2017.0083

    Article  Google Scholar 

  3. Sahare P, Dhok S (2018) Multilingual character segmentation and recognition schemes for indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/access.2018.2795104

    Article  Google Scholar 

  4. Ptucha R, Petroski Such F, Pillai S et al (2019) Intelligent character recognition using fully convolutional neural networks. Pattern Recognit 88:604–613. https://doi.org/10.1016/j.patcog.2018.12.017

    Article  Google Scholar 

  5. Otsu N (1979) A threshold selection method from Gray-Level histograms. IEEE Trans Syst Man Cybern 9:62–66. https://doi.org/10.1109/tsmc.1979.4310076

    Article  Google Scholar 

  6. Maheshwari M, Namdev D, Maheshwari S (2018) A systematic review of automation in handwritten character recognition. Int J Appl Eng Res 13(10):8090–8099

    Google Scholar 

  7. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804039

  8. Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236

    Article  Google Scholar 

  9. Kaur A, Rani U, Josan GS (2020) Modified Sauvola binarization for degraded document images. Eng Appl Artif Intel. https://doi.org/10.1016/j.engappai.2020.103672

    Article  Google Scholar 

  10. Moghaddam RF, Cheriet M (2012) Adotsu: an adaptive and parameter-less generalization of otsu’s method for document image binarization. Pattern Recognit 45(6):2419–2431

    Article  Google Scholar 

  11. Han Y, Wang W, Liu H, Wang Y (2019) A Combined approach for the binarization of historical Tibetan document images. Int J Pattern Recognit Artif Intell 33(14):1954038

    Article  Google Scholar 

  12. Kaur A, Rani U, Josan GS (2020) Modified sauvola binarization for degraded document images. Eng Appl Artif Intell 92:103672

    Article  Google Scholar 

  13. Tensmeyer C, Martinez T (2017) Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). IEEE pp 99–104

  14. Vo QN, Kim SH, Yang HJ, Lee G (2018) Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit 74:568–586

    Article  Google Scholar 

  15. Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization. Pattern Recognit 86:37–47

    Article  Google Scholar 

  16. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: international conference on medical image computing and computer-assisted intervention, Springer, pp 234-241

  17. Isola P, Zhu J-Y, Zhou T (2017) Efros AA Image-to-image translation with conditional adversarial networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  18. Basu A, Mondal R, Bhowmik S, Sarkar R (2020) U-Net versus Pix2Pix: a comparative study on degraded document image binarization. J Electron Imaging 29(6):063019

    Article  Google Scholar 

  19. Bezmaternykh PV, Ilin DA, Nikolaev DP (2019) U-Net-bin: hacking the document image binarization contest. Кoмпьютepнaя oптикa 43(5):825–832

    Google Scholar 

  20. Kang S, Iwana BK, Uchida S (2021) Complex image processing with less data—document image binarization by integrating multiple pre-trained U-Net modules. Pattern Recognit 109:107577

    Article  Google Scholar 

  21. Huang X, Li L, Liu R, Xu C, Ye M (2020) Binarization of degraded document images with global-local U-Nets. Optik 203:164025

    Article  Google Scholar 

  22. Iglovikov V, Mushinskiy S, Osin V (2017) Satellite imagery feature detection using deep convolutional neural network: A kaggle competition. arXiv preprint arXiv:170606169

  23. Pratikakis I, Zagoris K, Barlas G (2017) Gatos B ICDAR2017 competition on document image binarization (DIBCO 2017). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). IEEE, pp 1395–1403

  24. Ntirogiannis K, Gatos B, Pratikakis I (2012) Performance evaluation methodology for historical document image binarization. IEEE Trans Image Process 22(2):595–609

    Article  MathSciNet  MATH  Google Scholar 

  25. Lu H, Kot AC, Shi YQ (2004) Distance-reciprocal distortion measure for binary document images. IEEE Signal Process Lett 11(2):228–231

    Article  Google Scholar 

  26. (2017) DIBCO 2017 Dataset and evaluation tool. https://vc.ee.duth.gr/dibco2017/. Accessed 1 June 2021

  27. Tensmeyer C, Martinez T (2020) Historical document image binarization: a review. SN Comput Sci 1:1–26

    Article  Google Scholar 

  28. Long J et al. (2015) Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Funding

This study was funded by the National Natural Science Foundation of China (No.61772430, No.61375029), program for Leading Talent of State Ethnic Affairs Commission, program for Innovative Research Team of SEAC ([2018]98), the Gansu Provincial first-class discipline program of Northwest Minzu University (No.11080305), and the Postgraduate Support Programs of Northwest Minzu University’s Fundamental Research Funds for the Central Universities (Ymx2021002)

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Programming, data collection and analysis were performed by PZ. WW, the corresponding author, supervised the experiment procedure and revised the manuscript multiple times. GZ and YL assisted in labeling part of the data. The first draft of the manuscript was written by PZ, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Weilan Wang.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, P., Wang, W., Zhang, G. et al. Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images. Neural Comput & Applic 35, 13791–13802 (2023). https://doi.org/10.1007/s00521-021-06512-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06512-7

Keywords

Navigation