Abstract
Historical documents suffer from a variety of degradations, making it challenging to recover the original textual content. The image binarization problem seeks to separate the original textual content from the image degradations. In this paper, we present a new binarization technique to accurately learn original text patterns from a limited amount of available historical document data. Our approach consists of a cascade of style augmentation and image binarization networks. Our style augmentation network uses a random style transfer approach to improve the variety of training data by generating new style patterns for the existing documents. The binarization network employs an encoder-decoder-based text segmentation approach with atrous convolutions to preserve the spatial details. The resulting segmentations contain a considerably low noise level and smooth texture. Compared to other leading binarization methods available throughout the DIBCO competition, our proposed methods gain top performances across various evaluation measures.
Similar content being viewed by others
References
Arruda A, Mello CA (2014)Binarization of degraded document images based on combination of contrast images. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 615–620. IEEE
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018)Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818
Dumoulin V, Shlens J, Kudlur M (2016)A learned representation for artistic style. arXiv preprint arXiv:1610.07629
Efros AA, Freeman WT (2001) Image quilting for texture synthesis and transfer. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 341–346
Efros AA, Leung TK (1999)Texture synthesis by non-parametric sampling. In: Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1033–1038. IEEE
Garris MD (1992) Design and collection of a handwriting sample image database. Soc Sci Comp Rev 10(2):196–214
Gatos B, Ntirogiannis K, Pratikakis I(2009) Icdar 2009 document image binarization contest (dibco 2009). In: 2009 10th International conference on document analysis and recognition, pp. 1375–1382. IEEE
Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J (2017) Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830
He K, Zhang X, Ren S, Sun J (2016)Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Hedjam R, Cheriet M (2013) Historical document image restoration using multispectral imaging system. Patt Recognit 46(8):2297–2312
Hertzmann A, Jacobs CE, Oliver N, Curless B, Salesin DH (2001)Image analogies. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 327–340
Howe NR (2013) Document binarization with automatic parameter tuning. IntJ Doc Anal Recognit (IJDAR) 16(3):247–258
Jackson PT, Atapour-Abarghouei A, Bonner S, Breckon T, Obara B (2018) Style augmentation: Data augmentation via style randomization. arXiv preprint arXiv:1809.05375 pp. 1–13
Jia F, Shi C, He K, Wang C, Xiao B (2018) Degraded document image binarization using structural symmetry of strokes. Patt Recognit 74:225–240
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp. 694–711. Springer
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013)Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012)Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
Lelore T, Bouchara F (2013) Fair: a fast algorithm for document image restoration. IEEE Trans Patt Anal Mach Intell 35(8):2039–2048
Li C, Wand M (2016)Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European conference on computer vision, pp. 702–716. Springer
Nafchi HZ, Moghaddam RF, Cheriet M (2014) Phase-based binarization of ancient document images: model and applications. IEEE Trans Image Process 23(7):2916–2930
Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Liu Cl, et al (2019)Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition–rrc-mlt-2019. arXiv preprint arXiv:1907.00945
Ntirogiannis K, Gatos B, Pratikakis I (2012) Performance evaluation methodology for historical document image binarization. IEEE Trans Image Process 22(2):595–609
Ntirogiannis K, Gatos B, Pratikakis I (2014) A combined approach for the binarization of handwritten document images. Patt Recognit Lett 35:3–15
Ntirogiannis K, Gatos B, Pratikakis I (2014)Icfhr2014 competition on handwritten document image binarization (h-dibco 2014). In: 2014 14th International conference on frontiers in handwriting recognition, pp. 809–813. IEEE
Pastor-Pellicer J, España-Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015)Insights on the use of convolutional neural networks for document image binarization. In: International Work-Conference on Artificial Neural Networks, pp. 115–126. Springer
Pratikakis I, Gatos B, Ntirogiannis K (2010)H-dibco 2010-handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 727–732. IEEE
Pratikakis I, Gatos B, Ntirogiannis K (2011)Icdar 2011 document image binarization contest (dibco 2011). In: 2011 International Conference on Document Analysis and Recognition, pp. 1506–1510. IEEE
Pratikakis I, Gatos B, Ntirogiannis K (2012)Icfhr 2012 competition on handwritten document image binarization (h-dibco 2012). In: 2012 international conference on frontiers in handwriting recognition, pp. 817–822. IEEE
Pratikakis I, Gatos B, Ntirogiannis K (2013)Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1471–1476. IEEE
Pratikakis I, Zagoris K, Barlas G, Gatos B (2016)Icfhr2016 handwritten document image binarization contest (h-dibco 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623. IEEE
Pratikakis I, Zagoris K, Barlas G, Gatos B (2017) Icdar2017 competition on document image binarization (dibco 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1395–1403. IEEE
Rasyidi H, Khan S (2019)Historical document text binarization using atrous convolution and multi-scale feature decoder. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE
Shah L, Patel R, Patel S, Maniar J (2014) Handwritten character recognition using radial histogram. J Res Advent Technol E-ISSN 2321:9637
Simonyan K, Zisserman A (2014)Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016)Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826
Tensmeyer C, Martinez T (2017)Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 99–104. IEEE
Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS (2016)Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, vol. 1, p. 4
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016)Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140
Vo QN, Kim SH, Yang HJ, Lee G (2018) Binarization of degraded document images based on hierarchical deep supervised network. Patt Recognit 74:568–586
Wei LY, Levoy M (2000)Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 479–488
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017)East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560
Acknowledgements
This research is funded by Indonesia Endowment Fund for Education, Ministry of Finance, Republic of Indonesia. Award No.: PRJ-338 /LPDP.3/2017.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rasyidi, H., Khan, S. Historical document image binarization via style augmentation and atrous convolutions. Neural Comput & Applic 33, 7339–7352 (2021). https://doi.org/10.1007/s00521-020-05382-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05382-9