Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images

Zhao, Penghai; Wang, Weilan; Zhang, Guowei; Lu, Yuqi

doi:10.1007/s00521-021-06512-7

Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images

S.I.: IoT-based Health Monitoring System
Published: 04 October 2021

Volume 35, pages 13791–13802, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

351 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Binarization, one of the most popular research directions in computer vision, is still facing challenges, especially for the degraded historical Tibetan document images. Quite a few U-Net-based binarization approaches might encounter a particular problem called pseudo-touching which hampers subsequent procedures including text line segmentation, character segmentation, and recognition. To avoid these undesired pseudo-touching strokes and obtain optimal binary images, the present work employs several easy-to-use techniques, such as rescaling the input and output of the attention U-Net. Furthermore, we provide insights into the accelerated construction of the training set and discuss the effects of various configurations. The quantitative experimental results on our dataset show that upsampling the input image by a factor of two during the inference phase can alleviate the pseudo-touching. It achieves an average P-FM of 97.73 which is two percentage points higher than the result of U-Net. The proposed approach can also accept common challenges including non-uniform illumination, stains, noise and delivers finer performance across several metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks

SauvolaNet: Learning Adaptive Sauvola Network for Degraded Document Binarization

Deep semantic binarization for document images

Article 06 August 2022

Code availability

Related Code can be found on the author’s GitHub repositories: https://github.com/ssocean/Attention-U-Net

References

Bhowmik S, Kundu S, Sarkar R (2020) BINYAS: a complex document layout analysis system. Multimed Tools Appl 80:8471–8504. https://doi.org/10.1007/s11042-020-09832-3
Article Google Scholar
Vo QN, Kim SH, Yang HJ, Lee GS (2018) Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process 12(3):438–446. https://doi.org/10.1049/iet-ipr.2017.0083
Article Google Scholar
Sahare P, Dhok S (2018) Multilingual character segmentation and recognition schemes for indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/access.2018.2795104
Article Google Scholar
Ptucha R, Petroski Such F, Pillai S et al (2019) Intelligent character recognition using fully convolutional neural networks. Pattern Recognit 88:604–613. https://doi.org/10.1016/j.patcog.2018.12.017
Article Google Scholar
Otsu N (1979) A threshold selection method from Gray-Level histograms. IEEE Trans Syst Man Cybern 9:62–66. https://doi.org/10.1109/tsmc.1979.4310076
Article Google Scholar
Maheshwari M, Namdev D, Maheshwari S (2018) A systematic review of automation in handwritten character recognition. Int J Appl Eng Res 13(10):8090–8099
Google Scholar
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804039
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236
Article Google Scholar
Kaur A, Rani U, Josan GS (2020) Modified Sauvola binarization for degraded document images. Eng Appl Artif Intel. https://doi.org/10.1016/j.engappai.2020.103672
Article Google Scholar
Moghaddam RF, Cheriet M (2012) Adotsu: an adaptive and parameter-less generalization of otsu’s method for document image binarization. Pattern Recognit 45(6):2419–2431
Article Google Scholar
Han Y, Wang W, Liu H, Wang Y (2019) A Combined approach for the binarization of historical Tibetan document images. Int J Pattern Recognit Artif Intell 33(14):1954038
Article Google Scholar
Kaur A, Rani U, Josan GS (2020) Modified sauvola binarization for degraded document images. Eng Appl Artif Intell 92:103672
Article Google Scholar
Tensmeyer C, Martinez T (2017) Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). IEEE pp 99–104
Vo QN, Kim SH, Yang HJ, Lee G (2018) Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit 74:568–586
Article Google Scholar
Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization. Pattern Recognit 86:37–47
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: international conference on medical image computing and computer-assisted intervention, Springer, pp 234-241
Isola P, Zhu J-Y, Zhou T (2017) Efros AA Image-to-image translation with conditional adversarial networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Basu A, Mondal R, Bhowmik S, Sarkar R (2020) U-Net versus Pix2Pix: a comparative study on degraded document image binarization. J Electron Imaging 29(6):063019
Article Google Scholar
Bezmaternykh PV, Ilin DA, Nikolaev DP (2019) U-Net-bin: hacking the document image binarization contest. Кoмпьютepнaя oптикa 43(5):825–832
Google Scholar
Kang S, Iwana BK, Uchida S (2021) Complex image processing with less data—document image binarization by integrating multiple pre-trained U-Net modules. Pattern Recognit 109:107577
Article Google Scholar
Huang X, Li L, Liu R, Xu C, Ye M (2020) Binarization of degraded document images with global-local U-Nets. Optik 203:164025
Article Google Scholar
Iglovikov V, Mushinskiy S, Osin V (2017) Satellite imagery feature detection using deep convolutional neural network: A kaggle competition. arXiv preprint arXiv:170606169
Pratikakis I, Zagoris K, Barlas G (2017) Gatos B ICDAR2017 competition on document image binarization (DIBCO 2017). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). IEEE, pp 1395–1403
Ntirogiannis K, Gatos B, Pratikakis I (2012) Performance evaluation methodology for historical document image binarization. IEEE Trans Image Process 22(2):595–609
Article MathSciNet MATH Google Scholar
Lu H, Kot AC, Shi YQ (2004) Distance-reciprocal distortion measure for binary document images. IEEE Signal Process Lett 11(2):228–231
Article Google Scholar
(2017) DIBCO 2017 Dataset and evaluation tool. https://vc.ee.duth.gr/dibco2017/. Accessed 1 June 2021
Tensmeyer C, Martinez T (2020) Historical document image binarization: a review. SN Comput Sci 1:1–26
Article Google Scholar
Long J et al. (2015) Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition

Download references

Funding

This study was funded by the National Natural Science Foundation of China (No.61772430, No.61375029), program for Leading Talent of State Ethnic Affairs Commission, program for Innovative Research Team of SEAC ([2018]98), the Gansu Provincial first-class discipline program of Northwest Minzu University (No.11080305), and the Postgraduate Support Programs of Northwest Minzu University’s Fundamental Research Funds for the Central Universities (Ymx2021002)

Author information

Authors and Affiliations

KeyLaboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, 730000, China
Penghai Zhao, Weilan Wang, Guowei Zhang & Yuqi Lu

Authors

Penghai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weilan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guowei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Programming, data collection and analysis were performed by PZ. WW, the corresponding author, supervised the experiment procedure and revised the manuscript multiple times. GZ and YL assisted in labeling part of the data. The first draft of the manuscript was written by PZ, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Weilan Wang.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, P., Wang, W., Zhang, G. et al. Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images. Neural Comput & Applic 35, 13791–13802 (2023). https://doi.org/10.1007/s00521-021-06512-7

Download citation

Received: 08 July 2021
Accepted: 07 September 2021
Published: 04 October 2021
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00521-021-06512-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images

Abstract

Access this article

Similar content being viewed by others

ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks

SauvolaNet: Learning Adaptive Sauvola Network for Degraded Document Binarization

Deep semantic binarization for document images

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images

Abstract

Access this article

Similar content being viewed by others

ACMU-Nets: Attention Cascading Modular U-Nets Incorporating Squeeze and Excitation Blocks

SauvolaNet: Learning Adaptive Sauvola Network for Degraded Document Binarization

Deep semantic binarization for document images

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation