Skip to main content
Log in

Line and word segmentation of handwritten text document by mid-point detection and gap trailing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents the text line and word segmentation from unconstrained handwritten documents based on horizontal projection histogram (HPH) to detect mid-points and gap trailing between lines. The midpoints are estimated from the HPH for the first 100 to 200 columns of the whole document. Then, considering the mid-points, the gap is tracked between two consecutive lines from locally computed HPH for a block having k rows and j columns. The HPH block is examined for various cases to locate optimal rows that separate adjacent lines. The proposed method segments curve, touching and skew-lines and is robust to writing variation and language independent. Word segmentation is not treated as a separate problem and goes efficiently alongside the line segmentation. As the trailing of space between neighboring lines goes on, the vertical projection Histogram (VPH) of t columns is monitored between the above and below separator of a line and find the optimal word separator. The algorithm is evaluated on two isolated datasets of different languages (Meitei Mayek and English). Text-line and word segmentation on Meitei Mayek handwritten documents achieve 91.84% and 88.96% accuracy respectively. Similarly, the handwritten English document meets 94.18% and 87.73% accuracy for line and word segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abuhaiba ISI, Datta S, Holt MJJ (1995) Line extraction and stroke ordering of text pages. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE

  2. Arivazhagan M, Srinivasan H, Srihari S (2007) A statistical approach to line segmentation in handwritten documents. In: Document recognition and retrieval XIV, vol 6500. International Society for Optics and Photonics

  3. Basu S, et al. (2007) Text line extraction from multi-skewed handwritten documents. Pattern Recognit 40(6):1825–1839

    Article  Google Scholar 

  4. dos Santos RP, et al. (2009) Text line segmentation based on morphology and histogram projection. In: 2009 International 10th conference on document analysis and recognition. IEEE

  5. Ghosh S, et al. (2013) An OCR system for the Meetei Mayek script. In: 2013 Fourth national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE

  6. He J, Downton AC (2003) User-assisted archive document image analysis for digital library construction. In: Seventh international conference on document analysis and recognition, 2003. Proceedings. IEEE

  7. Inunganbi S, Choudhary P (2018) Recognition of handwritten Meitei Mayek script based on texture feature. Int J Nat Lang Comput (IJNLC) 7(5):99–108

    Google Scholar 

  8. Jindal P, Jindal B (2015) Line and word segmentation of handwritten text documents written in Gurmukhi Script using mid point detection technique. In: 2015 2nd international conference on recent advances in engineering & computational sciences (RAECS). IEEE

  9. Kahan S, Pavlidis T, Baird HS (1987) On the recognition of printed characters of any font and size. IEEE Trans Pattern Anal Mach Intell 2:274–288

    Article  Google Scholar 

  10. Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area Voronoi diagram. Comput Vis Image Underst 70(3):370–382

    Article  Google Scholar 

  11. Laishram R, et al. (2014) A neural network based handwritten Meitei Mayek alphabet optical character recognition system. In: 2014 IEEE international conference on computational intelligence and computing research. IEEE

  12. Louloudis G, et al. (2009) Text line and word segmentation of handwritten documents. Pattern Recognit 42(12):3169–3183

    Article  Google Scholar 

  13. Li Y, et al. (2006) A new algorithm for detecting text line in handwritten documents. In: Tenth international workshop on frontiers in handwriting recognition. Suvisoft

  14. Li Y, Zheng Y, Doermann D (2006) Detecting text lines in handwritten documents. In: 18th international conference on pattern recognition (ICPR’06), vol 2. IEEE

  15. Li Y, et al. (2008) Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans Pattern Anal Mach Intell 30(8):1313–1329

    Article  Google Scholar 

  16. Likforman-Sulem L, Hanimyan A, Faure C (1995) A Hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of 3rd international conference on document analysis and recognition, vol 2. IEEE

  17. Louloudis G, et al. (2006) A block-based Hough transform mapping for text line detection in handwritten documents. In: Tenth international workshop on frontiers in handwriting recognition. Suvisoft

  18. Malik SA, et al. (2019) An efficient segmentation technique for urdu optical character recognizer (ocr). In: Future of information and communication conference. Springer, Cham

  19. Marti U, Bunke H (1999) A full English sentence database for off-line handwriting recognition. In: Proceedings of the 5th international conference on document analysis and recognition, pp 705–708

  20. Marti Us-V, Bunke H (2001) Text line segmentation and word recognition in a system for general writer independent handwriting recognition. In: Proceedings of sixth international conference on document analysis and recognition. IEEE

  21. Marti U, Bunke H (2002) The IAM-database: an English sentence database for off-line handwriting recognition. Int J Doc Anal Recognit 5:39–46

    Article  Google Scholar 

  22. Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. Computer 25(7):10–22

    Article  Google Scholar 

  23. Nguyen KC, Nakagawa M (2016) Text-line and character segmentation for offline recognition of handwritten japanese text. IEICE Techn Rep 115(517):53–58

    Google Scholar 

  24. Nicolas S, Paquet T, Heutte L (2004) Text line segmentation in handwritten document using a production system. In: Ninth international workshop on frontiers in handwriting recognition. IEEE

  25. O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173

    Article  Google Scholar 

  26. Pal U, Datta S (2003) Segmentation of Bangla unconstrained handwritten text. null IEEE

  27. Pu Y, Shi Z (1998) A natural learning algorithm based on hough transform for text lines extraction in handwritten document: 637–646

  28. Saha S, et al. (2010) A Hough transform based technique for text segmentation. arXiv:1002.4048

  29. Simon A, Pret J-C, Johnson AP (1997) A fast algorithm for bottom-up document layout analysis. IEEE Trans Pattern Anal Mach Intell 19(3):273–277

    Article  Google Scholar 

  30. Su T-H, et al. (2007) Skew detection for Chinese handwriting by horizontal stroke histogram. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 2. IEEE

  31. Weliwitage C, Harvey AL, Jennings AB (2005) Handwritten document offline text line segmentation. In: Digital image computing: techniques and applications (DICTA’05). IEEE

  32. Yin F, Liu C-L (2009) Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognit 42(12):3146–3157

    Article  Google Scholar 

  33. Zahour A, et al. (2001) Arabic hand-written text-line extraction. In: Proceedings of sixth international conference on document analysis and recognition. IEEE

  34. Zahour A, et al. (2007) Text line segmentation of historical arabic documents. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 1. IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inunganbi Sanasam.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sanasam, I., Choudhary, P. & Singh, K.M. Line and word segmentation of handwritten text document by mid-point detection and gap trailing. Multimed Tools Appl 79, 30135–30150 (2020). https://doi.org/10.1007/s11042-020-09416-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09416-1

Keywords

Navigation