research-article

A hybrid CNN-Transformer model for Historical Document Image Binarization

Authors:

Vahid Rezanezhad,

Konstantin Baierer,

Clemens NeudeckerAuthors Info & Claims

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

Pages 79 - 84

https://doi.org/10.1145/3604951.3605508

Published: 25 August 2023 Publication History

Abstract

Document image binarization is one of the main preprocessing steps in document image analysis for text recognition. Noise, faint characters, bad scanning conditions, uneven lighting or paper aging can cause artifacts that negatively impact text recognition algorithms. The task of binarization is to segment the foreground (text) from these degradations in order to improve optical character recognition (OCR) results. Convolutional Neural Networks (CNNs) are one popular method for binarization. But they suffer from focusing on the local context in a document image. We have applied a hybrid CNN-Transformer model to convert a document image into a binary output. For the model training, we used datasets from the Document Image Binarization Contests (DIBCO). For the datasets DIBCO-2012, DIBCO-2017 and DIBCO-2018, our model outperforms the state-of-the-art algorithms.

References

[1]

Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd international workshop on historical document imaging and processing. ACM, New York, 79–84.

Digital Library

[2]

Younes Akbari, Somaya Al-Maadeed, and Kalthoum Adam. 2020. Binarization of Degraded Document Images Using Convolutional Neural Networks and Wavelet-Based Multichannel Images. IEEE Access 8 (2020), 153517–153534.

[3]

Seyed Morteza Ayatollahi and Hossein Ziaei Nafchi. 2013. Persian heritage image binarization competition (PHIBC 2012). In 2013 First Iranian Conference on Pattern Recognition and Image Analysis (PRIA). IEEE, New York, 1–4.

[4]

Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, and Partha Pratim Roy. 2019. Improving document binarization via adversarial noise-texture augmentation. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, New York, 2721–2725.

[5]

Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, and R. Manmatha. 2021. LaTr: Layout-Aware Transformer for Scene-Text VQA.

[6]

Jean-Christophe Burie, Mickaël Coustaty, Setiawan Hadi, Made Windu Antara Kesiman, Jean-Marc Ogier, Erick Paulus, Kimheng Sok, I Made Gede Sunarya, and Dona Valy. 2016. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, New York, 596–601.

[7]

Jorge Calvo-Zaragoza and Antonio-Javier Gallego. 2019. A selectional auto-encoder approach for document image binarization. Pattern Recognition 86 (2019), 37–47.

[8]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers.

[9]

Quang-Vinh Dang and Guee-Sang Lee. 2021. Document Image Binarization With Stroke Boundary Feature Guided Network. IEEE Access 9 (2021), 36924–36936. https://doi.org/10.1109/ACCESS.2021.3062904

[10]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, New York, 248–255.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

[12]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

[13]

Basilios Gatos, Ioannis Pratikakis, and Stavros J Perantonis. 2006. Adaptive degraded document image binarization. Pattern Recognition 39, 3 (2006), 317–327.

Digital Library

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]

[15]

Nicholas R Howe. 2013. Document binarization with automatic parameter tuning. International journal on document analysis and recognition (ijdar) 16 (2013), 247–258.

Digital Library

[16]

Sana Khamekhem Jemni, Mohamed Ali Souibgui, Yousri Kessentini, and Alicia Fornés. 2022. Enhance to read better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement. Pattern Recognition 123 (mar 2022), 108370.

[17]

Seokjun Kang, Brian Kenji Iwana, and Seiichi Uchida. 2021. Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules. Pattern Recognition 109 (2021), 107577.

Digital Library

[18]

Netanel Kligler, Sagi Katz, and Ayellet Tal. 2018. Document enhancement using visibility detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, 2374–2382.

[19]

Kin Gwn Lore, Adedotun Akintayo, and Soumik Sarkar. 2017. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650–662.

Digital Library

[20]

Wayne Niblack. 1985. An Introduction to Digital Image Processing. Strandberg Publishing Company, DNK.

[21]

Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62–66.

[22]

Yu-Ting Pai, Yi-Fan Chang, and Shanq-Jang Ruan. 2010. Adaptive thresholding algorithm: Efficient computation technique based on intelligent block detection for degraded document images. Pattern Recogn. 43 (09 2010), 3177–3187.

[23]

Ioannis Pratikakis, Basilis Gatos, and Konstantinos Ntirogiannis. 2013. ICDAR 2013 document image binarization contest (DIBCO 2013). In 2013 12th International Conference on Document Analysis and Recognition. IEEE, New York, 1471–1476.

[24]

Ioannis Pratikakis, Konstantinos Zagori, Panagiotis Kaddas, and Basilis Gatos. 2018. ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, New York, 489–493. https://doi.org/10.1109/ICFHR-2018.2018.00091

[25]

Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, New York, 1395–1403.

[26]

Ahmed Cheikh Rouhou, Marwa Dhiaf, Yousri Kessentini, and Sinda Ben Salem. 2022. Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recognition Letters 155 (2022), 128–134.

Digital Library

[27]

J. Sauvola and M. Pietikäinen. 2000. Adaptive document image binarization. Pattern Recognition 33, 2 (2000), 225–236.

[28]

Ray Smith. 2007. An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), Vol. 2. IEEE, New York, 629–633.

Digital Library

[29]

Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, and Umapada Pal. 2022. DocEnTr: An End-to-End Document Image Enhancement Transformer.

[30]

Mohamed Ali Souibgui and Yousri Kessentini. 2020. De-gan: A conditional generative adversarial network for document enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2020), 1180–1191.

[31]

Alaa Sulaiman, Khairuddin Omar, and Mohammad Faidzul Nasrudin. 2019. Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions. Journal of Imaging 5 (04 2019).

[32]

Chris Tensmeyer and Tony R. Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01 (2017), 99–104.

[33]

Chris Tensmeyer and Tony R. Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01 (2017), 99–104.

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. Advances in neural information processing systems 30 (2017).

[35]

Christian Wolf and J-M Jolion. 2004. Extraction and recognition of artificial text in multimedia documents. Formal Pattern Analysis & Applications 6 (2004), 309–326.

[36]

Wei Xiong, Jingjing Xu, Xiong Zijie, Wang-Li Juan, and Liu Min. 2018. Degraded historical document image binarization using local features and support vector machine (SVM). Optik 164 (2018), 218–223.

[37]

Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei A. F. Florêncio, Cha Zhang, Wanxiang Che, Min Zhang, and Lidong Zhou. 2020. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding. CoRR abs/2012.14740 (2020). arXiv:2012.14740https://arxiv.org/abs/2012.14740

[38]

Jinyuan Zhao, Cunzhao Shi, Fuxi Jia, Yanna Wang, and Baihua Xiao. 2019. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognition 96 (2019), 106968.

Digital Library

Cited By

Ranjan ARavinder M(2024)ROBDD-TrOCRBERTa: a novel robust-optimized blurred document text deblurring and completion with DCGAN-TrOCR and DistilRoBERTaInternational Journal of Information Technology10.1007/s41870-024-02073-916:7(4611-4619)Online publication date: 20-Jul-2024
https://doi.org/10.1007/s41870-024-02073-9
Quattrini FPippi VCascianelli SCucchiara R(2024)Binarizing Documents by Leveraging both Space and FrequencyDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70543-4_1(3-22)Online publication date: 9-Sep-2024
https://doi.org/10.1007/978-3-031-70543-4_1

Index Terms

A hybrid CNN-Transformer model for Historical Document Image Binarization
1. Applied computing
  1. Document management and text processing
    1. Document capture
2. Information systems
  1. Information systems applications
    1. Digital libraries and archives

Recommendations

Broken and degraded document images binarization

Document image binarization refers to the conversion of a document image into a binary image. For broken and severely degraded document images, binarization is a very challenging process. Unlike the traditional methods that separate the foreground from ...
Nonlinear diffusion system for simultaneous restoration and binarization of degraded document images
Abstract
Existing diffusion models can only do tasks for either restoration or binarization of degraded document images; in this paper, we pay close attention to the problem of simultaneous restoration and binarization. We first introduce a model of image ...
Highlights
- Model of image formation is introduced for describing degraded document images.
- Nonlinear diffusion system is proposed for restoration and binarization of degraded text images.
- Our model has shown promising results in terms of ...
OCR Oriented Binarization Method of Document Image
CISP '08: Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 4 - Volume 04

For the gray-level image of government resource document, a linear transform was employed to enhance the image contrast. A spatial filter was applied to eliminate image noise. After this preprocessing, the threshold surface T1 was computed by Bernsen ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

August 2023

117 pages

ISBN:9798400708411

DOI:10.1145/3604951

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

BKM

Conference

HIP '23

HIP '23: 7th International Workshop on Historical Document Imaging and Processing

August 25 - 26, 2023

CA, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
122
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)4

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ranjan ARavinder M(2024)ROBDD-TrOCRBERTa: a novel robust-optimized blurred document text deblurring and completion with DCGAN-TrOCR and DistilRoBERTaInternational Journal of Information Technology10.1007/s41870-024-02073-916:7(4611-4619)Online publication date: 20-Jul-2024
https://doi.org/10.1007/s41870-024-02073-9
Quattrini FPippi VCascianelli SCucchiara R(2024)Binarizing Documents by Leveraging both Space and FrequencyDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70543-4_1(3-22)Online publication date: 9-Sep-2024
https://doi.org/10.1007/978-3-031-70543-4_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten