Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

Dutta, Arpita; Garai, Arpan; Biswas, Samit; Das, Amit Kumar

doi:10.1007/s10032-021-00370-8

Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

Original Paper
Published: 21 May 2021

Volume 24, pages 299–313, (2021)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Arpita Dutta¹,
Arpan Garai²,
Samit Biswas¹ &
…
Amit Kumar Das¹

835 Accesses
6 Citations
Explore all metrics

Abstract

Paper documents are ideal sources of useful information and have a profound impact on every aspect of human lives. These documents may be printed or handwritten and contain information as combinations of texts, figures, tables, charts, etc. This paper proposes a method to segment text lines from both flatbed scanned/camera-captured heavily warped printed and handwritten documents. This work uses the concept of semantic segmentation with the help of a multi-scale convolutional neural network. The results of line segmentation using the proposed method outperform a number of similar proposals already reported in the literature. The performance and efficacy of the proposed method have been corroborated by the test result on a variety of publicly available datasets, including ICDAR, Alireza, IUPR, cBAD, Tobacco-800, IAM, and our dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Line Segmentation: A FCN Based Approach

Fully convolutional network with dilated convolutions for handwritten text line segmentation

Article 30 May 2018

Text Line Extraction Using Fully Convolutional Network and Energy Minimization

References

Ahn, B., Ryu, J., Koo, H.I., Cho, N.I.: Textline detection in degraded historical document images. EURASIP J. Image Video Process. 2017(1), 82 (2017)
Article Google Scholar
Alaei, A., Pal, U., Nagabhushan, P.: A new scheme for unconstrained handwritten text-line segmentation. Pattern Recogn. 44(4), 917–928 (2011)
Article Google Scholar
Alaei, A., Pal, U., Nagabhushan, P.: Dataset and ground truth for handwritten text in four different scripts. Int. J. Pattern Recogn. Artif. Intell. 26, 2012 (2012)
Article MathSciNet Google Scholar
Antonacopoulos, A., Karatzas, D.: Document image analysis for world war ii personal records. In: Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on, pp. 336–341. IEEE (2004)
Asi, A., Rabaev, I., Kedem, K., El-Sana, J.: User-assisted alignment of arabic historical manuscripts. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 22–28. ACM (2011)
Biswas, S., Das, A.K.: Writer identification of bangla handwritings by radon transform projection profile. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 215–219. IEEE (2012)
Bukhari, S.S., Shafait, F., Breuel, T.M.: T.m.: Dewarping of document images using coupled-snakes. In: In: Proceedings of Third International Workshop on Camera-Based Document Analysis and Recognition, pp. 34–41 (2009)
Bukhari, S.S., Shafait, F., Breuel, T.M.: Text-line extraction using a convolution of isotropic gaussian filter with a set of line filters. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 579–583. IEEE (2011)
Bukhari, S.S., Shafait, F., Breuel, T.M.: The IUPR Dataset of Camera-Captured Document Images, pp. 164–171. Springer, Berlin (2012)
cBAD: Scriptnet / icdar 2017 competition on baseline detection (cbad). https://scriptnet.iit.demokritos.gr/competitions/5/1/. (Accessed on 03/14/2019)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European conference on computer vision, pp. 184–199. Springer (2014)
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning, 2016. arXiv preprint arXiv:1603.07285 (2016)
Dutta, A., Garai, A., Biswas, S.: Segmentation of meaningful text-regions from camera captured document images. In: 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), pp. 1–4. IEEE (2018)
Eskenazi, S., Gomez-Krämer, P., Ogier, J.M.: A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recogn. 64, 1–14 (2017)
Article Google Scholar
Garai, A., Biswas, S., Mandal, S.: A theoretical justification of warping generation for dewarping using cnn. Pattern Recognition 109, 107621
Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: Automatic rectification of warped bangla document images. IET Image Processing (2019)
Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: A method to generate synthetically warped document image. arXiv preprint arXiv:1910.06621 (2019)
Gatos, B., Louloudis, G., Stamatopoulos, N.: Segmentation of historical handwritten documents into text zones and text lines. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 464–469. IEEE (2014)
He, J., Downton, A.C.: User-assisted archive document image analysis for digital library construction. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 498–502. IEEE
Hendry, R.C.: Automatic license plate recognition via sliding-window darknet-yolo deep learning. Image Vis. Comput. 87, 47–56 (2019)
Article Google Scholar
Kil, T., Seo, W., Koo, H.I., Cho, N.I.: Robust document image dewarping method using text-lines and line segments. In: 2017 14th IAPR International Conference on Document Analysis and Recognition, vol. 01, pp. 865–870
Kim, B.S., Koo, H.I., Cho, N.I.: Document dewarping via text-line based optimization. Pattern Recogn. 48(11), 3600–3614 (2015)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kornfield, E.M., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004, pp. 195–209. IEEE (2004)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., J.Heard: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference, pp. 665–666 (2006)
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)
Article Google Scholar
Liu, C., Zhang, Y., Wang, B., Ding, X.: Restoring camera-captured distorted document images. IJDAR 18(2), 111–124 (2015)
Article Google Scholar
Liwicki, M., Bunke, H.: Iam-ondb - an on-line english sentence database acquired from handwritten text on a whiteboard. In: Eighth International Conference on Document Analysis and Recognition, pp. 956–961 Vol. 2 (2005)
Liwicki, M., Indermuhle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, vol. 1, pp. 447–451. IEEE (2007)
Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C.: Text line and word segmentation of handwritten documents. Pattern Recogn. 42(12), 3169–3183 (2009)
Article Google Scholar
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article Google Scholar
Moysset, B., Adam, P., Wolf, C., Louradour, J.: Space displacement localization neural networks to locate origin points of handwritten text lines in historical documents. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, HIP ’15, pp. 1–8. ACM, New York, NY, USA (2015)
Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: Learning where to start and when to stop. In: 2017 14th IAPR International Conference on Document Analysis and Recognition, vol. 01, pp. 871–876 (2017)
Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 456–460. IEEE (2015)
Moysset, B., Kermorvant, C., Wolf, C., Louradour, J.: Paragraph text segmentation into lines with recurrent neural networks. In: 2015 13th International Conference on Document Analysis and Recognition, pp. 456–460 (2015)
Moysset, B., Louradour, J., Kermorvant, C., Wolf, C.: Learning text-line localization with shared and local regression neural networks. In: Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on, pp. 1–6. IEEE (2016)
Nagy, G., Seth, S.: Hierarchical representation of optically scanned documents (1984)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp. II–II. IEEE (2003)
Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 369–376. ACM (2004)
Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. IJDAR 21(3), 177–186 (2018)
Article Google Scholar
Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)
Article Google Scholar
Ryu, J., Koo, H.I., Cho, N.I.: Language-independent text-line extraction algorithm for handwritten documents. IEEE Signal Process. Lett. 21(9), 1115–1119 (2014)
Article Google Scholar
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recogn. Lett. 35, 23–33 (2014)
Article Google Scholar
Samit, B.: Department of computer science and technology. https://oldwww.iiests.ac.in/index.php/researchsamitbiswas-cst-menuitem
Shi, Z., Govindaraju, V.: Line separation for complex document images using fuzzy runlength. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004. pp. 306–312. IEEE (2004)
Shi, Z., Setlur, S., Govindaraju, V.: Text extraction from gray scale historical document images using adaptive local connectivity map. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2005, pp. 794–798. IEEE (2005)
Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: 10th International Conference on Document Analysis and Recognition, 2009. ICDAR’09, pp. 176–180. IEEE (2009)
Shivakumara, P., Phan, T.Q., Tan, C.L.: A laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y., Tang, S.: A novel image text extraction method based on k-means clustering. ICIS 08, 185–190 (2008)
Google Scholar
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: Icdar 2013 handwriting segmentation contest. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1402–1406. IEEE (2013)
Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, pp. 101–106. ACM (2017)
Tobacco: The Legacy Tobacco Document Library \((LTDL)\). http://legacy.library.ucsf.edu/
Vo, Q.N., Kim, S.H., Yang, H.J., Lee, G.S.: Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process. 12(3), 438–446 (2017)
Article Google Scholar
Ye, Q., Gao, W., Wang, W., Zeng, W.: A robust text detection algorithm in images and video frames. In: Information, Communications and Signal Processing, 2003 and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, vol. 2, pp. 802–806. IEEE (2003)
Yin, F., Liu, C.L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)
Article Google Scholar
Zhu, X., Qian, Y., Zhao, X., Sun, B., Sun, Y.: A deep learning approach to patch-based image inpainting forensics. Signal Process. Image Commun. 67, 90–99 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, India
Arpita Dutta, Samit Biswas & Amit Kumar Das
Department of Computer Science and Engineering, University of Engineering and Management, Kolkata, India
Arpan Garai

Authors

Arpita Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Arpan Garai
View author publications
You can also search for this author in PubMed Google Scholar
Samit Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpita Dutta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutta, A., Garai, A., Biswas, S. et al. Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images. IJDAR 24, 299–313 (2021). https://doi.org/10.1007/s10032-021-00370-8

Download citation

Received: 27 November 2019
Revised: 14 April 2021
Accepted: 28 April 2021
Published: 21 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10032-021-00370-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

Abstract

Access this article

Similar content being viewed by others

Text Line Segmentation: A FCN Based Approach

Fully convolutional network with dilated convolutions for handwritten text line segmentation

Text Line Extraction Using Fully Convolutional Network and Energy Minimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

Abstract

Access this article

Similar content being viewed by others

Text Line Segmentation: A FCN Based Approach

Fully convolutional network with dilated convolutions for handwritten text line segmentation

Text Line Extraction Using Fully Convolutional Network and Energy Minimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation