Skip to main content
Log in

Annotated comparisons of proposed preprocessing techniques for script recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Offline cursive script recognition and their associated issues are still fresh despite of last few decades’ research. This paper presents an annotated comparison of proposed and recently published preprocessing techniques with reported work in the offline cursive script recognition. Normally, in the offline script analysis, the input is a paper image or a word or a digit and the desired output is ASCII text. This task involves several preprocessing steps, and some of them are quite hard such as line removal from text, skew removal, reference line detection (lower/upper baselines), slant removal, scaling, noise elimination, contour smoothing and skeleton. Moreover, subsequent stage of segmentation (if any) and recognition is also highly dependent on these preprocessing techniques. This paper presents an analysis and annotated comparison of latest preprocessing techniques proposed by authors with those reported in the literature on IAM/CEDAR benchmark databases. Finally, future work and persist problems are highlighted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Arica N, Yarman-Vural FT (2002) Optical character recognition for cursive handwriting. IEEE Trans Pattern Anal Mach Intell 24(6):801–813

    Article  Google Scholar 

  2. Arvind KR, Kumar J, Ramakrishnan AG (2007) Line removal and restoration of handwritten strokes. In: Proceedings of the international conference on computational intelligence and multimedia applications, pp 208–214

  3. Bai Z-L, Huo Q (2004) Underline detection and removal in a document image using multiple strategies. In: Proceedings of the 17th international conference on pattern recognition (ICPR04), vol 2, pp 578–581

  4. Blumenstein M, Verma B (1999) Neural solutions for the segmentation and recognition of difficult words from a benchmark database. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, pp 281–284

  5. Blumenstein M, Verma B (1999) A new segmentation algorithm for handwritten word recognition. In: Proceedings of the international joint conference on neural networks, Washington, DC, vol 4, pp 878–882

  6. Blumenstein M, Verma B (2001) Analysis of segmentation performance on the CEDAR benchmark database. In: Proceedings of 6th international conference on document analysis and recognition, pp 1142–1146

  7. Blumenstein M, Cheng CK, Liu XY (2002) New preprocessing techniques for handwritten word recognition. In: Proceedings of 2nd international conference on visualization, imaging and image processing, ACTA Press, Calgary, pp 480–484

  8. Blumenstein M, Verma B, Basli H (2003) A novel feature extraction technique for the recognition of segmented handwritten characters. In: Proceedings of the seventh international conference on document analysis and recognition, pp 137–141

  9. Blumenstein M, Liu XY, Verma B (2004) A modified direction feature for cursive character recognition. In: Proceedings of the international joint conference on neural networks, Budapest, Hungary, pp 2983–2987

  10. Bozinovic RM, Srihari SN (1989) Offline cursive script word recognition. IEEE Trans Pattern Anal Mach Intell 11(1):68–83

    Article  Google Scholar 

  11. Brown MK, Ganapathy S (1983) Preprocessing techniques for cursive script word recognition. Pattern Recogn 16(5):447–458

    Article  Google Scholar 

  12. Britto A Jr, Sabourin R, Bortolozzi F, Suen CY (2004) Foreground and background information in an HMM-based method for recognition of isolated characters and numeral strings. In: Proceedings of the 9th international workshop on frontiers in handwriting recognition, pp 371–376

  13. Burges CJC, Be JI, Nohl CR (1992) Recognition of handwritten cursive postal words using neural networks. In: Proceedings of the 5th United States postal service (USPS) advanced technology conference, pp 117–124

  14. Caesar T, Gloger JM, Mandler E (1993) Preprocessing and feature extraction for a handwriting recognition system. In: Proceedings of international conference on document analysis and recognition, pp 408–411

  15. Cai J, Liu Z-Q (2000) Offline unconstrained handwritten word recognition. Int J Pattern Recognit Artif Intell 14(3):259–280

    Article  Google Scholar 

  16. Camastra F, Vinciarelli A (2003) Combining neural gas and learning vector quantization for cursive character recognition. Neuro-computing 51:147–159

    Google Scholar 

  17. Chen M-Y, Kundu A, Zhou J, Srihari SN (1992) Offline handwritten word recognition using hidden markov model. In: Proceedings of the 5th USPS advanced T

  18. Cheng CK, Blumenstein M (2005) Improving the segmentation of cursive handwritten words using ligature detection and neural validation. In: Proceedings of the fourth Asia Pacific international symposium on information technology (APIS 2005), Gold Coast, Australia, pp 56–59

  19. Cheriet M, Kharma N, Liu C-L, Suen C-Y (2007) Character recognition systems (OCR). Wiley, New York, pp 204–206

  20. Cote M, Lecolinet E, Cheriet M, Suen CY (1998) Automatic reading of cursive scripts using a reading model and perceptual concepts—the PERCEPTO system. Int J Doc Anal Recogn 1(1):3–17

    Article  Google Scholar 

  21. Dimauro G, Impedovo S, Pirlo G, Salzo A (1997) Removing underlines from handwritten text: an experimental investigation. In: Downton AC, Impedovo S (eds) Progress in handwriting recognition. World Scientific Publishing, Singapore, pp 497–501

    Google Scholar 

  22. Dong J-X, Dominique P, Krzyyzak A, Suen C-Y (2005) Cursive word skew/slant corrections based on Radon transform. In: Proceedings of the 8th international conference on document analysis and recognition, pp 478–483

  23. El-Hajj R, Likforman-Sulem L, Mokbel C (2005) Arabic handwriting recognition using baseline dependant features and hidden Markov modeling. In: Proceedings of the 2005 eight international conference on document analysis and recognition (ICDAR’05), pp 893–897

  24. El-Yacoubi A, Gilloux M, Sabourin R, Suen CY (1999) An HMM-based Approach for online unconstrained handwritten word modeling and recognition. IEEE Trans Pattern Anal Mach Intell 21(8):752–760

    Article  Google Scholar 

  25. Foley JD, Dam AV, Feiner SK, Hughes JF (1996) Computer graphics: principles and practice in C, 2nd edn. Addison-Wesley, Pearson Education, Boston

    MATH  Google Scholar 

  26. Gatos B, Pratikakis I, Perantonis SJ (2006) Hybrid offline cursive handwriting word recognition. In: Proceedings of 18th international conference on pattern recognition (ICPR’06), vol 2, pp 998–1002

  27. Gatos B, Pratikakis I, Kesidis AL, Perantonis SJ (2006) Efficient offline cursive handwriting word recognition. In: Proceedings of the tenth international workshop on frontiers in handwriting recognition

  28. Gatos B, Antonacopoulos A, Stamatopoulos N (2007) ICDAR 2007 handwriting segmentation context. In: Proceedings of the international conference on document analysis and recognition, pp 1284–1288

  29. Govindaraju V, Srihari SH (1992) Separating handwritten text from interfering strokes. In: Impedovo S, Simon JC (eds) From pixels to features III—frontiers in handwriting recognition. North-Holland Publication, Amsterdam, pp 17–28

    Google Scholar 

  30. Guillevic D, Suen CY (1993) Cursive script recognition: a fast reader scheme. In: Proceedings of the 3rd international conference on documents analysis and recognition, pp 311–314

  31. Hamamura T, Akagi T, Irie B (2007) An analytic word recognition algorithm using a posteriori probability. Proc Int Conf Doc Anal Recogn 02:669–673

    Google Scholar 

  32. Hanmandlu M, Murali KRM, Chakraborty S, Goyal S, Choudhury DR (2003) Unconstrained handwritten character recognition based on fuzzy logic. Pattern Recogn 36(3):603–623

    Article  Google Scholar 

  33. Indira K, Selvi S (2007) An off line cursive script recognition system using Fourier-wavelet features. In: International conference on computational intelligence and multimedia applications, pp 506–511

  34. Kavallieratou E, Fakotakis N, Kokkinakis G (1999) New algorithms for skewing correction and slant removal on word level. In: Proceedings of 6th IEEE international conference on electronics, circuits and systems, vol 2, pp 1159–1162

  35. Kavallieratou E, Fakotakis N, Kokkinakis G (2000) A slant removal algorithm. Pattern Recogn 33(7):1261–1262

    Article  Google Scholar 

  36. Kavallieratou E, Stamatatos E, Fakotakis N, Kokkinakis G (2000) Handwritten character segmentation using transformation-based learning. In: Proceedings of 15th international conference on pattern recognition, vol 2, pp 634–637

  37. Kavallieratou E, Fakotakis N, Kokkinakis G (2001) Slant estimation algorithm for OCR system. Pattern Recogn 34(12):2515–2522

    Article  MATH  Google Scholar 

  38. Kavallieratou E, Dromazou N, Fakotakis N, Kokkinakis G (2003) An integrated system for handwritten document image processing. Int J Pattern Recognit Artif Intell 17(4):617–636

    Article  Google Scholar 

  39. Kapp MN, de Almendra Freitas C, Sabourin R (2007) Methodology for the design of NN-based month-word recognizers written on Brazilian bank checks. Image Vision Comput 25(1):40–49

    Article  Google Scholar 

  40. Kim G, Govindaraju V, Srihari SN (1999) Architecture for handwritten text recognition systems. Adv Handwrit Recogn 163–182

  41. Kim G, Govindaraju V (1997) A Lexicon Driven approach to handwritten word recognition for real time application. IEEE Trans Pattern Anal Mach Intell 19(4):366–379

    Article  Google Scholar 

  42. Koerich AL, Ling LL (1998) a system for automatic extraction of the user-entered data from bank checks. In: Proceedings of international symposium on computer graphics, image processing and vision, pp 270–278

  43. Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary offline handwriting recognition: a survey. Pattern Anal Appl 6(2):97–121

  44. Koerich AL, Britto A, Oliveira LES, Sabourin R (2006) Fusing high- and low-level features for handwritten word recognition. In: Proceedings of the tenth international workshop on frontiers in handwriting recognition

  45. Lee L, Coelho S (2005) A simple and efficient method for global handwritten word recognition applied to Brazilian bank checks. In: Proceedings of the 8th international conference on document analysis and recognition, pp 950–955

  46. Liolios N, Fakotakis N, Kokkinakis G (2002) On the generalization of the form identification and skew detection problem. Pattern Recognition 35:253–264

    Article  MATH  Google Scholar 

  47. Liu C-L, Fujisawa H (2005) Classification and learning for character recognition: comparison of methods and remaining problems. In: Proceedings of the international workshop on neural networks and learning in document analysis and recognition, pp 5–7

  48. Madhvanath S, Kleinberg E, Govindaraju V (1999) Holistic verification of handwritten phrases. IEEE Trans Pattern Anal Mach Intell 21:1344–1356

    Article  Google Scholar 

  49. Madhvanath S, Shrihari S (1996) A technique for local baseline determination. In: Proceedings of the 5th international workshop on frontiers in handwriting recognition, pp 445–448

  50. Marinai S, Gori M, Soda G (2005) Artificial neural networks for document analysis and recognition. IEEE Trans Pattern Anal Mach Intell 27(1):23–35

    Article  Google Scholar 

  51. Morita M, Facon J, Bortolozzi F, Garnes S, Sabourin R (1999) Mathematical morphology and weighted least squares to correct handwriting baseline skew. In: Proceedings of the international conference on document analysis and recognition, vol 1, Bangalore, pp 430–433

  52. Neamah K, Mohamad, D, Saba T, Rehman A (2014) Discriminative features mining for offline handwritten signature verification. 3D Research 5(3). doi:10.1007/s13319-013-0002-3

  53. Nicchiotti G, Scagliola C (1999) Generalised projections: a tool for cursive handwriting normalization. In: Proceedings of 5th international conference on document analysis and recognition, ICDAR’99, Bangalore India, pp 729–732

  54. O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173

    Article  Google Scholar 

  55. Okun O, Pietikainen M, Sauvola J (1999) Robust skew estimation on low-resolution document images. In: 5th International conference on document analysis and recognition, pp 621–624

  56. Paster M, Toselli A, Vidal E (2004) Projection profile based algorithm for slant removal. In: Proceedings of the international conference on image analysis and recognition, pp 183–190

  57. Rehman A, Dzulkifli M, Kurniawan F (2008) Line and skew removal from off-line cursive handwritten words. Int J Res (Sci) 24(2):28–33

  58. Rehman A, Alqahtani S, Altameem A, Saba T (2013) Virtual machine security challenges: case studies. Int J Mach Learn Cybernet. doi:10.1007/s13042-013-0166-4

    Google Scholar 

  59. Rehman A, Saba T (2012) Evaluation of artificial intelligent techniques to secure information in enterprises. Artif Intell Rev. doi:10.1007/s10462-012-9372-9

    Google Scholar 

  60. Rehman A, Saba T (2012) Neural network for document image preprocessing. Artif Intell Rev. doi:10.1007/s10462-012-9337-z

    Google Scholar 

  61. Rehman A, Saba T (2011) Document skew estimation and correction: analysis of techniques, common problems and possible solutions. Appl Artif Intell 25(9):769–787

    Article  Google Scholar 

  62. Rehman A, Kurniawan F, Saba T (2011) An automatic approach for line detection and removal without characters smash-up. Imaging Science Journal 59(3):171–182

    Article  Google Scholar 

  63. Rehman A, Mohamad D, Sulong G, Saba T (2009) Simple and effective techniques for core zone detection and slant correction in script recognition. In: The IEEE international conference on signal and image processing applications (ICSIPA’09), pp 15–20

  64. Saba T (2012) Offline cursive touched script recognition. PhD Thesis, submitted to Universiti Teknologi Malaysia, pp 73–80

  65. Saba T, Alzorani S, Rehman A (2012) Expert system for offline clinical guidelines and treatment. Life Sci J 9(4):2639–2658

    Google Scholar 

  66. Saba T, Rehman A (2012) Effects of artificially intelligent tools on pattern recognition. Int J Mach Learn Cybernet 4:155–162. doi:10.1007/s13042-012-0082-z

    Article  Google Scholar 

  67. Saba T, Rehman A (2012) Machine learning and script recognition. Lambert Academic Publisher, ISBN-10: 3659111708, pp 78–91

  68. Saba T, Rehman A, Elarbi-Boudihir M (2011) Methods and strategies on off-line cursive touched characters segmentation: a directional review. Artif Intell Rev. doi:10.1007/s10462-011-9271-5

    Google Scholar 

  69. Saba T, Rehman A, Sulong G (2011) Improved statistical features for cursive character recognition. Int J Innov Comput Inf Control (IJICIC) 7(9):5211–5224

    Google Scholar 

  70. Saba T, Rehman A, Sulong G (2011) Cursive Script Segmentation with Neural Confidence. Int J Innov Comput Inf Control (IJICIC) 7(8):4955–4964

    Google Scholar 

  71. Sarfraz M, Mahmoud SA, Rasheed Z (2007) On skew estimation and correction of text. In: Proceedings of international conference on computer graphics, imaging and visualization, pp 308–313

  72. Sarfraz M, Zidouri A, Shahab SA (2005) A novel approach for skew estimation of document images in OCR system. In: Proceedings of IEEE conference on computer graphics, imaging and vision: new trends (CGIV’05), pp 175–180

  73. Senior AW (1994) Offline cursive handwriting recognition using recurrent neural networks. PhD Dissertation, University of Cambridge, England

  74. Senior AW, Robinson AJ (1998) An offline cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321

    Article  Google Scholar 

  75. Senior W, Robinson AJ (2002) An offline cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321

    Article  Google Scholar 

  76. Suen CY, Tan J (2005) Analysis of errors of handwritten digits made by a multitude of classifiers. Pattern Recogn Lett 26(3):369–379

    Article  Google Scholar 

  77. Taira E, Uchida S, Sakoe H (2004) Non-uniform slant correction for handwritten word recognition. IEICE Trans Inf Syst E87-D(5):1247–1253

    Google Scholar 

  78. Uchida S, Taira E, Sakoe H (2001) Non-uniform slant correction using dynamic programming. In: Proceedings of 6th international conference on document analysis and recognition, vol 1, pp 434–438

  79. Verma B (2002) A contour character extraction approach in conjunction with a neural confidence fusion technique for the segmentation of handwriting recognition. In: Proceeding of the 9th international conference on neural information processing, vol 5, pp 2459–2463

  80. Vinciarelli A, Luettin J (2001) A new normalization technique for cursive handwritten words. Pattern Recogn Lett 22:1043–1050

    Article  MATH  Google Scholar 

  81. Watanabe MM, Hamammoto Y, Yasuda T, Tomita S (1997) Normalization techniques of handwritten numerals for Gabor filters. In: Proceedings of the international conference on document analysis and recognition, ICDAR IEEE, Los Alamitos, CA, vol 1, pp 303–307

  82. Wang L, Wang X, Feng J (2006) On image matrix based feature extraction algorithms. IEEE Trans Syst Man Cybernet Cybernet 36(1):194–197

    Article  Google Scholar 

  83. Yoo J-Y, Kim M-K, Han SY, Kwon Y-B (1997) Line removal and restoration of handwritten characters on the form documents. In: Proceedings of the fourth international conference on document analysis and recognition, vol 1, pp 128–131

  84. Yu B, Jain AK (1996) A generic system for form dropout. IEEE Trans Pattern Anal Mach Intell 18(11):1127–1132

    Article  Google Scholar 

  85. Zeeuw FD (2006) Slant correction using histogram. Bachelor thesis, pp 3–4

  86. Zheng Y, Li Y, Doermann D (2006) Detecting text lines in handwritten documents. In: Proceedings of 18th international conference on pattern recognition, vol 2, pp 1030–1033

Download references

Acknowledgments

Our deepest thanks and appreciation to the Deanship of Scientific Research at King Saud University (KSU) Riyadh Kingdom of Saudi Arabia for funding this research. We are also thankful to our colleague researchers for their assistance in neural network training and testing phases.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amjad Rehman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saba, T., Rehman, A., Altameem, A. et al. Annotated comparisons of proposed preprocessing techniques for script recognition. Neural Comput & Applic 25, 1337–1347 (2014). https://doi.org/10.1007/s00521-014-1618-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1618-9

Keywords

Navigation