Skip to main content
Log in

Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The most important and difficult task in text document analysis is to achieve line segmentation accurately, particularly when the document is composed of unconstrained handwritten text. To accomplish this objective a painting scheme is proposed in this research work. Being motivated by the fact that the handwritten Persian texts offer the most critical challenges in the process of text-line segmentation, the new method has been devised by studying the cursive Persian text scripts extensively; yet, in general the proposed line segmentation algorithm is applicable to handwritten text in any language/script. The text block is vertically decomposed into parallel pipe structures called as strip. Each row in each strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. Subsequently, the painted pipes are converted into two-tone painting and it is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, phrased as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. The PPSLs are concatenated to produce the segmentation of text lines. Some additional procedures are built to handle certain anomalies, which may occur. The scheme is validated by extensive experimentation. We tested the proposed algorithm with 52 pages of Persian text documents containing totally 823 lines and correct line segmentation of 92.35% is achieved. Moreover, the proposed algorithm was also tested with two different datasets of 152 and 200 handwritten text-pages of different languages. Efficiency and script independency of the proposed algorithm were proved when compared with various approaches presented in recent literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Figs. 9–10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Document Anal Recognit 9(2):123–138

    Article  Google Scholar 

  2. Bortolozzi F, Britto Jr, Alceu de S, Oliveira LS, Morita M (2005) Recent advances in handwriting recognition. In: Pal et al U (eds) Document analysis. ISBN: 8177647849, pp 1–31

  3. Srihari SN, Ball G (2008) An assessment of arabic handwriting recognition technology. CEDAR Technical Report, TR-03-07

  4. http://en.wikipedia.org/wiki foreign: dated 25-02-2009

  5. Hashemi MR, Fatemi O, Safavi R (1995) Persian cursive script recognition. Proc Third Int Conf Document Anal Recogn 2:869–873

    Article  Google Scholar 

  6. Timár G, Karacs K, Rekeczky C (2002) Analogic preprocessing and segmentation algorithms for off-line handwriting recognition. In: Proceedings of seventh IEEE international workshop on cellular neural networks and their applications (CNNA02), pp 407–414

  7. Manmatha R, Rothfeder JL (2005) A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans Pattern Anal Mach Intell 27(8):1212–1225

    Article  Google Scholar 

  8. Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of the sixth international conference on document analysis and recognition (ICDAR01), pp 281–285

  9. Pal U, Datta S (2003) Segmentation of bangla unconstrained handwritten text. In: Proceedings of the seventh international conference on document analysis and recognition (ICDAR 2003), pp 1128–1132

  10. Tripathy N, Pal U (2004) Handwriting segmentation of unconstrained oriya text. In: Proceedings of ninth international workshop on frontiers in handwriting recognition (IWFHR), pp 306–311

  11. Zahour A, Taconet B, Likforman-Sulem L, Boussellaa W (2009) Overlapping and multi-touching text-line segmentation by Block Covering analysis. Pattern Anal Appl 12(4):335–351

    Article  MathSciNet  Google Scholar 

  12. Shi Z, Govindaraju V (2004) Line separation for complex document images using fuzzy runlength. In: First international workshop on document image analysis for libraries (DIAL’04), pp 306–307

  13. Likforman-Sulem L, Hanimyan A, Faure C (1995) A Hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the third international conference on document analysis and recognition, Montreal, Canada, pp 774–777

  14. Louloudis G, Gatos B, Pratikakis I, Halatsis C (2008) Text line detection in handwritten documents. Pattern Recogn 41:3758–3772

    Article  MATH  Google Scholar 

  15. Basu S, Chaudhuri C, Kundu M, Nasipuri M, Basu DK (2007) Text line extraction from multi-skewed handwritten documents. Pattern Recogn 40(6):1825–1839

    Article  MATH  Google Scholar 

  16. Li Y, Zheng Y, Doermann D, Jaeger S (2008) Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans Pattern Anal Mach Intell 30(8):1313–1329

    Article  Google Scholar 

  17. Bukhari SS, Shafait F, Breuel TM (2009) Script-independent handwritten textlines segmentation using active contours. In: Proceedings of the 10th international conference on document analysis and recognition, pp 446–450

  18. Yin F, Liu C-L (2009) Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recogn 42(12):3146–3157

    Article  MATH  Google Scholar 

  19. Wang H, Suter D (2003) Color image segmentation using global information and local homogeneity. In: Seventh international conference on digital image computing: techniques and applications, pp 89–98

  20. Skarbek W, Koschan A (1994) Colour image segmentation—a survey. Technical Report 94-32, Technical University of Berlin, Department of Computer Science, Germany

  21. Panneton B, Brouillard M (2008) Assessing color representation methods for segmentation of vegetation in color photographs. Published by the American Society of Agricultural and Biological Engineers

  22. Ball GR, Srihari SN, Srinivasan H (2006) Segmentation-based and segmentation-free methods for spotting handwritten arabic words. In: Proceedings of 10th international workshop on frontiers in handwriting recognition (IWFHR 2006), pp 53–58

  23. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–69

    Article  MathSciNet  Google Scholar 

  24. Gatos B, Stamatopoulos N, Louloudis G (2009) ICDAR2009 Handwriting segmentation contest. In: Proceedings of 10th international conference on document analysis and recognition, pp 1393–1397

  25. Gatos B, Antonacopoulos A, Stamatopoulos N (2007) ICDAR2007 handwriting segmentation contest. In: Proceedings of ninth international conference on document analysis and recognition, pp 1284–1288

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alireza Alaei.

Appendix: contributions in this paper

Appendix: contributions in this paper

Line segmentation from unconstrained handwritten document is a difficult task because of the writing styles of different individuals. Characters of two consecutive text lines may touch or overlap and such touching/overlapping makes the line segmentation task more complex. In this paper, a painting scheme is proposed to facilitate unconstrained handwritten text-line segmentation process. In the proposed scheme, input text page is vertically decomposed into parallel pipe structures called as strip. The width of strips is automatically computed based on the space (gap) between the consecutive lines in each text-page. Each row of a strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. The painted strips are then converted into two-tone painting image and using some smoothing operations the two-tone painted image is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, called as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. Finally, the PPSLs are concatenated or extended for text-line separation. The proposed method can also handle touching/overlapping cases. To do so, the proposed system initially detects the touching/overlapping zones and then based on the structural behavior of such zones, they are segmented.

The scheme is validated by extensive experimentations with many scripts. The proposed algorithm was tested with 52 pages of Persian text documents containing totally 823 lines and 92.35% line segmentation accuracy was achieved. Moreover, the proposed algorithm was tested with two different datasets containing 152 and 200 handwritten text-pages of different languages such as English, Greek, French, and German. Efficiency and script independency of the proposed algorithm was proved when compared with various approaches presented in recent literature.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alaei, A., Nagabhushan, P. & Pal, U. Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Applic 14, 381–394 (2011). https://doi.org/10.1007/s10044-011-0226-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0226-x

Keywords

Navigation