Abstract
We present a learning-based method for handwritten text line segmentation in document images. Our approach relies on a variant of deep fully convolutional networks (FCNs) with dilated convolutions. Dilated convolutions allow to never reduce the input resolution and produce a pixel-level labeling. The FCN is trained to identify X-height labeling as text line representation, which has many advantages for text recognition. We show that our approach outperforms the most popular variants of FCN, based on deconvolution or unpooling layers, on a public dataset. We also provide results investigating various settings, and we conclude with a comparison of our model with recent approaches defined as part of the cBAD (https://scriptnet.iit.demokritos.gr/competitions/5/) international competition, leading us to a 91.3% F-measure.










Similar content being viewed by others
Notes
Please note that a preliminary work has been presented at the ICDAR-WML workshop [25].
References
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder–decoder architecture for image segmentation (2015). arXiv:1511.00561
Chen, L., Papandreou, V., Kokkinos, I., Murphy, K., Yuille, A.: Semantic image segmentation with deep convolutional nets and fully connected crfs (2014). arXiv:1412.7062
Chen, LC., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs (2016). arXiv:1606.00915
Chen, LC., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv:1706.05587
Eskenazi, S., Gomez-Krämer, P., Ogier, J.M.: A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognit. 64, 1–14 (2017)
Girshick, R.: Fast r-cnn. In: ICCV, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: Read-bad: a new dataset and evaluation scheme for baseline detection in archival documents (2017). arXiv:1705.03311
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A real-time algorithm for signal analysis with the help of the wavelet transform. In: Wavelets, pp. 286–297. Springer (1989)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: ECCV, pp. 497–511 (2014)
Krähenbühl, P.: Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS, pp. 109–117 (2011)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: Ssd: Single shot multibox detector. In: ECCV, pp. 21–37. Springer (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Moysset, B., Adam, P., Wolf, C., Louradour, J.: Space displacement localization neural networks to locate origin points of handwritten text lines in historical documents. In: Workshop on Historical Document Imaging and Processing, August (2015)
Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: learning where to start and when to stop. In: ICDAR (2017)
Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: ICDAR, pp. 456–460 (2015)
Moysset, B., Louradour, J., Kermorvant, C., Wolf, C.: Learning text-line localization with shared and local regression neural networks. In: ICFHR (2016)
Murdock, M., Reid, S., Hamilton, B., Reese, J.: Icdar 2015 competition on text line detection in historical documents. In: ICDAR, pp, 1171–1175 (2015)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Paquet, T., Heutte, L., Koch, G., Chatelain, C.: A categorization system for handwritten documents. IJDAR 15(4), 315–330 (2012)
Parvez, M.T., Mahmoud, S.A.: Offline arabic handwritten text recognition: a survey. ACM Comput. Surv. (CSUR) 45(2), 23 (2013)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters—improve semantic segmentation by global convolutional network (2017). arXiv:1703.02719
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR, abs/1612.08242 (2016)
Renton, G., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Handwritten text line segmentation using fully convolutional network. In 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, vol. 5, pp. 5–9. IEEE (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597 (2015)
Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. Signal Process. Lett. 21(9), 1115–1119 (2014)
Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: ICDAR, pp. 176–180 (2009)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2014)
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: Icdar 2013 handwriting segmentation contest. In: ICDAR, pp. 1402–1406 (2013)
Stuner, B., Chatelain, C., Paquet, T.: LV-ROVER: lexicon verified recognizer output voting error reduction. CoRR, abs/1707.07432 (2017)
Vo, Q.N., Lee, G.: Dense prediction for text line segmentation in handwritten document images. In: ICIP, pp. 3264–3268 (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv:1511.07122
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks (2016). arXiv:1604.04018
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)
Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: CVPR, pp. 625–632 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported by the French National grant ANR 16-LCV2-0004-01 Labcom INKS. This work is founded by the French region Normandy and the European Union. Europe acts in Normandy with the European Regional Development Fund (ERDF).
Rights and permissions
About this article
Cite this article
Renton, G., Soullard, Y., Chatelain, C. et al. Fully convolutional network with dilated convolutions for handwritten text line segmentation. IJDAR 21, 177–186 (2018). https://doi.org/10.1007/s10032-018-0304-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-018-0304-3