Skip to main content
Log in

Historical document image binarization via style augmentation and atrous convolutions

  • S.I. : DICTA 2019
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Historical documents suffer from a variety of degradations, making it challenging to recover the original textual content. The image binarization problem seeks to separate the original textual content from the image degradations. In this paper, we present a new binarization technique to accurately learn original text patterns from a limited amount of available historical document data. Our approach consists of a cascade of style augmentation and image binarization networks. Our style augmentation network uses a random style transfer approach to improve the variety of training data by generating new style patterns for the existing documents. The binarization network employs an encoder-decoder-based text segmentation approach with atrous convolutions to preserve the spatial details. The resulting segmentations contain a considerably low noise level and smooth texture. Compared to other leading binarization methods available throughout the DIBCO competition, our proposed methods gain top performances across various evaluation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Arruda A, Mello CA (2014)Binarization of degraded document images based on combination of contrast images. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 615–620. IEEE

  2. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  3. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018)Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818

  4. Dumoulin V, Shlens J, Kudlur M (2016)A learned representation for artistic style. arXiv preprint arXiv:1610.07629

  5. Efros AA, Freeman WT (2001) Image quilting for texture synthesis and transfer. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 341–346

  6. Efros AA, Leung TK (1999)Texture synthesis by non-parametric sampling. In: Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1033–1038. IEEE

  7. Garris MD (1992) Design and collection of a handwriting sample image database. Soc Sci Comp Rev 10(2):196–214

    Article  Google Scholar 

  8. Gatos B, Ntirogiannis K, Pratikakis I(2009) Icdar 2009 document image binarization contest (dibco 2009). In: 2009 10th International conference on document analysis and recognition, pp. 1375–1382. IEEE

  9. Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J (2017) Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830

  10. He K, Zhang X, Ren S, Sun J (2016)Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  11. Hedjam R, Cheriet M (2013) Historical document image restoration using multispectral imaging system. Patt Recognit 46(8):2297–2312

    Article  Google Scholar 

  12. Hertzmann A, Jacobs CE, Oliver N, Curless B, Salesin DH (2001)Image analogies. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 327–340

  13. Howe NR (2013) Document binarization with automatic parameter tuning. IntJ Doc Anal Recognit (IJDAR) 16(3):247–258

    Article  Google Scholar 

  14. Jackson PT, Atapour-Abarghouei A, Bonner S, Breckon T, Obara B (2018) Style augmentation: Data augmentation via style randomization. arXiv preprint arXiv:1809.05375 pp. 1–13

  15. Jia F, Shi C, He K, Wang C, Xiao B (2018) Degraded document image binarization using structural symmetry of strokes. Patt Recognit 74:225–240

    Article  Google Scholar 

  16. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp. 694–711. Springer

  17. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE

  18. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013)Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE

  19. Krizhevsky A, Sutskever I, Hinton GE (2012)Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105

  20. Lelore T, Bouchara F (2013) Fair: a fast algorithm for document image restoration. IEEE Trans Patt Anal Mach Intell 35(8):2039–2048

    Article  Google Scholar 

  21. Li C, Wand M (2016)Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European conference on computer vision, pp. 702–716. Springer

  22. Nafchi HZ, Moghaddam RF, Cheriet M (2014) Phase-based binarization of ancient document images: model and applications. IEEE Trans Image Process 23(7):2916–2930

    Article  MathSciNet  Google Scholar 

  23. Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Liu Cl, et al (2019)Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition–rrc-mlt-2019. arXiv preprint arXiv:1907.00945

  24. Ntirogiannis K, Gatos B, Pratikakis I (2012) Performance evaluation methodology for historical document image binarization. IEEE Trans Image Process 22(2):595–609

    Article  MathSciNet  Google Scholar 

  25. Ntirogiannis K, Gatos B, Pratikakis I (2014) A combined approach for the binarization of handwritten document images. Patt Recognit Lett 35:3–15

    Article  Google Scholar 

  26. Ntirogiannis K, Gatos B, Pratikakis I (2014)Icfhr2014 competition on handwritten document image binarization (h-dibco 2014). In: 2014 14th International conference on frontiers in handwriting recognition, pp. 809–813. IEEE

  27. Pastor-Pellicer J, España-Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015)Insights on the use of convolutional neural networks for document image binarization. In: International Work-Conference on Artificial Neural Networks, pp. 115–126. Springer

  28. Pratikakis I, Gatos B, Ntirogiannis K (2010)H-dibco 2010-handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 727–732. IEEE

  29. Pratikakis I, Gatos B, Ntirogiannis K (2011)Icdar 2011 document image binarization contest (dibco 2011). In: 2011 International Conference on Document Analysis and Recognition, pp. 1506–1510. IEEE

  30. Pratikakis I, Gatos B, Ntirogiannis K (2012)Icfhr 2012 competition on handwritten document image binarization (h-dibco 2012). In: 2012 international conference on frontiers in handwriting recognition, pp. 817–822. IEEE

  31. Pratikakis I, Gatos B, Ntirogiannis K (2013)Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1471–1476. IEEE

  32. Pratikakis I, Zagoris K, Barlas G, Gatos B (2016)Icfhr2016 handwritten document image binarization contest (h-dibco 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623. IEEE

  33. Pratikakis I, Zagoris K, Barlas G, Gatos B (2017) Icdar2017 competition on document image binarization (dibco 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1395–1403. IEEE

  34. Rasyidi H, Khan S (2019)Historical document text binarization using atrous convolution and multi-scale feature decoder. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE

  35. Shah L, Patel R, Patel S, Maniar J (2014) Handwritten character recognition using radial histogram. J Res Advent Technol E-ISSN 2321:9637

    Google Scholar 

  36. Simonyan K, Zisserman A (2014)Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  37. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016)Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826

  38. Tensmeyer C, Martinez T (2017)Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 99–104. IEEE

  39. Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS (2016)Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, vol. 1, p. 4

  40. Veit A, Matera T, Neumann L, Matas J, Belongie S (2016)Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140

  41. Vo QN, Kim SH, Yang HJ, Lee G (2018) Binarization of degraded document images based on hierarchical deep supervised network. Patt Recognit 74:568–586

    Article  Google Scholar 

  42. Wei LY, Levoy M (2000)Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 479–488

  43. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017)East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560

Download references

Acknowledgements

This research is funded by Indonesia Endowment Fund for Education, Ministry of Finance, Republic of Indonesia. Award No.: PRJ-338 /LPDP.3/2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanif Rasyidi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rasyidi, H., Khan, S. Historical document image binarization via style augmentation and atrous convolutions. Neural Comput & Applic 33, 7339–7352 (2021). https://doi.org/10.1007/s00521-020-05382-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05382-9

Keywords

Navigation