skip to main content
10.1145/3604951.3605508acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

A hybrid CNN-Transformer model for Historical Document Image Binarization

Published: 25 August 2023 Publication History

Abstract

Document image binarization is one of the main preprocessing steps in document image analysis for text recognition. Noise, faint characters, bad scanning conditions, uneven lighting or paper aging can cause artifacts that negatively impact text recognition algorithms. The task of binarization is to segment the foreground (text) from these degradations in order to improve optical character recognition (OCR) results. Convolutional Neural Networks (CNNs) are one popular method for binarization. But they suffer from focusing on the local context in a document image. We have applied a hybrid CNN-Transformer model to convert a document image into a binary output. For the model training, we used datasets from the Document Image Binarization Contests (DIBCO). For the datasets DIBCO-2012, DIBCO-2017 and DIBCO-2018, our model outperforms the state-of-the-art algorithms.

References

[1]
Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd international workshop on historical document imaging and processing. ACM, New York, 79–84.
[2]
Younes Akbari, Somaya Al-Maadeed, and Kalthoum Adam. 2020. Binarization of Degraded Document Images Using Convolutional Neural Networks and Wavelet-Based Multichannel Images. IEEE Access 8 (2020), 153517–153534.
[3]
Seyed Morteza Ayatollahi and Hossein Ziaei Nafchi. 2013. Persian heritage image binarization competition (PHIBC 2012). In 2013 First Iranian Conference on Pattern Recognition and Image Analysis (PRIA). IEEE, New York, 1–4.
[4]
Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, and Partha Pratim Roy. 2019. Improving document binarization via adversarial noise-texture augmentation. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, New York, 2721–2725.
[5]
Ali Furkan Biten, Ron Litman, Yusheng Xie, Srikar Appalaraju, and R. Manmatha. 2021. LaTr: Layout-Aware Transformer for Scene-Text VQA.
[6]
Jean-Christophe Burie, Mickaël Coustaty, Setiawan Hadi, Made Windu Antara Kesiman, Jean-Marc Ogier, Erick Paulus, Kimheng Sok, I Made Gede Sunarya, and Dona Valy. 2016. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, New York, 596–601.
[7]
Jorge Calvo-Zaragoza and Antonio-Javier Gallego. 2019. A selectional auto-encoder approach for document image binarization. Pattern Recognition 86 (2019), 37–47.
[8]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers.
[9]
Quang-Vinh Dang and Guee-Sang Lee. 2021. Document Image Binarization With Stroke Boundary Feature Guided Network. IEEE Access 9 (2021), 36924–36936. https://doi.org/10.1109/ACCESS.2021.3062904
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, New York, 248–255.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[13]
Basilios Gatos, Ioannis Pratikakis, and Stavros J Perantonis. 2006. Adaptive degraded document image binarization. Pattern Recognition 39, 3 (2006), 317–327.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]
[15]
Nicholas R Howe. 2013. Document binarization with automatic parameter tuning. International journal on document analysis and recognition (ijdar) 16 (2013), 247–258.
[16]
Sana Khamekhem Jemni, Mohamed Ali Souibgui, Yousri Kessentini, and Alicia Fornés. 2022. Enhance to read better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement. Pattern Recognition 123 (mar 2022), 108370.
[17]
Seokjun Kang, Brian Kenji Iwana, and Seiichi Uchida. 2021. Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules. Pattern Recognition 109 (2021), 107577.
[18]
Netanel Kligler, Sagi Katz, and Ayellet Tal. 2018. Document enhancement using visibility detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, 2374–2382.
[19]
Kin Gwn Lore, Adedotun Akintayo, and Soumik Sarkar. 2017. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650–662.
[20]
Wayne Niblack. 1985. An Introduction to Digital Image Processing. Strandberg Publishing Company, DNK.
[21]
Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62–66.
[22]
Yu-Ting Pai, Yi-Fan Chang, and Shanq-Jang Ruan. 2010. Adaptive thresholding algorithm: Efficient computation technique based on intelligent block detection for degraded document images. Pattern Recogn. 43 (09 2010), 3177–3187.
[23]
Ioannis Pratikakis, Basilis Gatos, and Konstantinos Ntirogiannis. 2013. ICDAR 2013 document image binarization contest (DIBCO 2013). In 2013 12th International Conference on Document Analysis and Recognition. IEEE, New York, 1471–1476.
[24]
Ioannis Pratikakis, Konstantinos Zagori, Panagiotis Kaddas, and Basilis Gatos. 2018. ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, New York, 489–493. https://doi.org/10.1109/ICFHR-2018.2018.00091
[25]
Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, New York, 1395–1403.
[26]
Ahmed Cheikh Rouhou, Marwa Dhiaf, Yousri Kessentini, and Sinda Ben Salem. 2022. Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recognition Letters 155 (2022), 128–134.
[27]
J. Sauvola and M. Pietikäinen. 2000. Adaptive document image binarization. Pattern Recognition 33, 2 (2000), 225–236.
[28]
Ray Smith. 2007. An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), Vol. 2. IEEE, New York, 629–633.
[29]
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, and Umapada Pal. 2022. DocEnTr: An End-to-End Document Image Enhancement Transformer.
[30]
Mohamed Ali Souibgui and Yousri Kessentini. 2020. De-gan: A conditional generative adversarial network for document enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2020), 1180–1191.
[31]
Alaa Sulaiman, Khairuddin Omar, and Mohammad Faidzul Nasrudin. 2019. Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions. Journal of Imaging 5 (04 2019).
[32]
Chris Tensmeyer and Tony R. Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01 (2017), 99–104.
[33]
Chris Tensmeyer and Tony R. Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01 (2017), 99–104.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. Advances in neural information processing systems 30 (2017).
[35]
Christian Wolf and J-M Jolion. 2004. Extraction and recognition of artificial text in multimedia documents. Formal Pattern Analysis & Applications 6 (2004), 309–326.
[36]
Wei Xiong, Jingjing Xu, Xiong Zijie, Wang-Li Juan, and Liu Min. 2018. Degraded historical document image binarization using local features and support vector machine (SVM). Optik 164 (2018), 218–223.
[37]
Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei A. F. Florêncio, Cha Zhang, Wanxiang Che, Min Zhang, and Lidong Zhou. 2020. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding. CoRR abs/2012.14740 (2020). arXiv:2012.14740https://arxiv.org/abs/2012.14740
[38]
Jinyuan Zhao, Cunzhao Shi, Fuxi Jia, Yanna Wang, and Baihua Xiao. 2019. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognition 96 (2019), 106968.

Cited By

View all
  • (2024)ROBDD-TrOCRBERTa: a novel robust-optimized blurred document text deblurring and completion with DCGAN-TrOCR and DistilRoBERTaInternational Journal of Information Technology10.1007/s41870-024-02073-916:7(4611-4619)Online publication date: 20-Jul-2024
  • (2024)Binarizing Documents by Leveraging both Space and FrequencyDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70543-4_1(3-22)Online publication date: 9-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing
August 2023
117 pages
ISBN:9798400708411
DOI:10.1145/3604951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Binarization
  2. Deep learning
  3. Document processing
  4. Image enhancement
  5. OCR
  6. Transformers.

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • BKM

Conference

HIP '23

Acceptance Rates

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)4
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ROBDD-TrOCRBERTa: a novel robust-optimized blurred document text deblurring and completion with DCGAN-TrOCR and DistilRoBERTaInternational Journal of Information Technology10.1007/s41870-024-02073-916:7(4611-4619)Online publication date: 20-Jul-2024
  • (2024)Binarizing Documents by Leveraging both Space and FrequencyDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70543-4_1(3-22)Online publication date: 9-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media