Abstract
Offline cursive script recognition and their associated issues are still fresh despite of last few decades’ research. This paper presents an annotated comparison of proposed and recently published preprocessing techniques with reported work in the offline cursive script recognition. Normally, in the offline script analysis, the input is a paper image or a word or a digit and the desired output is ASCII text. This task involves several preprocessing steps, and some of them are quite hard such as line removal from text, skew removal, reference line detection (lower/upper baselines), slant removal, scaling, noise elimination, contour smoothing and skeleton. Moreover, subsequent stage of segmentation (if any) and recognition is also highly dependent on these preprocessing techniques. This paper presents an analysis and annotated comparison of latest preprocessing techniques proposed by authors with those reported in the literature on IAM/CEDAR benchmark databases. Finally, future work and persist problems are highlighted.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arica N, Yarman-Vural FT (2002) Optical character recognition for cursive handwriting. IEEE Trans Pattern Anal Mach Intell 24(6):801–813
Arvind KR, Kumar J, Ramakrishnan AG (2007) Line removal and restoration of handwritten strokes. In: Proceedings of the international conference on computational intelligence and multimedia applications, pp 208–214
Bai Z-L, Huo Q (2004) Underline detection and removal in a document image using multiple strategies. In: Proceedings of the 17th international conference on pattern recognition (ICPR04), vol 2, pp 578–581
Blumenstein M, Verma B (1999) Neural solutions for the segmentation and recognition of difficult words from a benchmark database. In: Proceedings of the 5th international conference on document analysis and recognition (ICDAR’99), Bangalore, India, pp 281–284
Blumenstein M, Verma B (1999) A new segmentation algorithm for handwritten word recognition. In: Proceedings of the international joint conference on neural networks, Washington, DC, vol 4, pp 878–882
Blumenstein M, Verma B (2001) Analysis of segmentation performance on the CEDAR benchmark database. In: Proceedings of 6th international conference on document analysis and recognition, pp 1142–1146
Blumenstein M, Cheng CK, Liu XY (2002) New preprocessing techniques for handwritten word recognition. In: Proceedings of 2nd international conference on visualization, imaging and image processing, ACTA Press, Calgary, pp 480–484
Blumenstein M, Verma B, Basli H (2003) A novel feature extraction technique for the recognition of segmented handwritten characters. In: Proceedings of the seventh international conference on document analysis and recognition, pp 137–141
Blumenstein M, Liu XY, Verma B (2004) A modified direction feature for cursive character recognition. In: Proceedings of the international joint conference on neural networks, Budapest, Hungary, pp 2983–2987
Bozinovic RM, Srihari SN (1989) Offline cursive script word recognition. IEEE Trans Pattern Anal Mach Intell 11(1):68–83
Brown MK, Ganapathy S (1983) Preprocessing techniques for cursive script word recognition. Pattern Recogn 16(5):447–458
Britto A Jr, Sabourin R, Bortolozzi F, Suen CY (2004) Foreground and background information in an HMM-based method for recognition of isolated characters and numeral strings. In: Proceedings of the 9th international workshop on frontiers in handwriting recognition, pp 371–376
Burges CJC, Be JI, Nohl CR (1992) Recognition of handwritten cursive postal words using neural networks. In: Proceedings of the 5th United States postal service (USPS) advanced technology conference, pp 117–124
Caesar T, Gloger JM, Mandler E (1993) Preprocessing and feature extraction for a handwriting recognition system. In: Proceedings of international conference on document analysis and recognition, pp 408–411
Cai J, Liu Z-Q (2000) Offline unconstrained handwritten word recognition. Int J Pattern Recognit Artif Intell 14(3):259–280
Camastra F, Vinciarelli A (2003) Combining neural gas and learning vector quantization for cursive character recognition. Neuro-computing 51:147–159
Chen M-Y, Kundu A, Zhou J, Srihari SN (1992) Offline handwritten word recognition using hidden markov model. In: Proceedings of the 5th USPS advanced T
Cheng CK, Blumenstein M (2005) Improving the segmentation of cursive handwritten words using ligature detection and neural validation. In: Proceedings of the fourth Asia Pacific international symposium on information technology (APIS 2005), Gold Coast, Australia, pp 56–59
Cheriet M, Kharma N, Liu C-L, Suen C-Y (2007) Character recognition systems (OCR). Wiley, New York, pp 204–206
Cote M, Lecolinet E, Cheriet M, Suen CY (1998) Automatic reading of cursive scripts using a reading model and perceptual concepts—the PERCEPTO system. Int J Doc Anal Recogn 1(1):3–17
Dimauro G, Impedovo S, Pirlo G, Salzo A (1997) Removing underlines from handwritten text: an experimental investigation. In: Downton AC, Impedovo S (eds) Progress in handwriting recognition. World Scientific Publishing, Singapore, pp 497–501
Dong J-X, Dominique P, Krzyyzak A, Suen C-Y (2005) Cursive word skew/slant corrections based on Radon transform. In: Proceedings of the 8th international conference on document analysis and recognition, pp 478–483
El-Hajj R, Likforman-Sulem L, Mokbel C (2005) Arabic handwriting recognition using baseline dependant features and hidden Markov modeling. In: Proceedings of the 2005 eight international conference on document analysis and recognition (ICDAR’05), pp 893–897
El-Yacoubi A, Gilloux M, Sabourin R, Suen CY (1999) An HMM-based Approach for online unconstrained handwritten word modeling and recognition. IEEE Trans Pattern Anal Mach Intell 21(8):752–760
Foley JD, Dam AV, Feiner SK, Hughes JF (1996) Computer graphics: principles and practice in C, 2nd edn. Addison-Wesley, Pearson Education, Boston
Gatos B, Pratikakis I, Perantonis SJ (2006) Hybrid offline cursive handwriting word recognition. In: Proceedings of 18th international conference on pattern recognition (ICPR’06), vol 2, pp 998–1002
Gatos B, Pratikakis I, Kesidis AL, Perantonis SJ (2006) Efficient offline cursive handwriting word recognition. In: Proceedings of the tenth international workshop on frontiers in handwriting recognition
Gatos B, Antonacopoulos A, Stamatopoulos N (2007) ICDAR 2007 handwriting segmentation context. In: Proceedings of the international conference on document analysis and recognition, pp 1284–1288
Govindaraju V, Srihari SH (1992) Separating handwritten text from interfering strokes. In: Impedovo S, Simon JC (eds) From pixels to features III—frontiers in handwriting recognition. North-Holland Publication, Amsterdam, pp 17–28
Guillevic D, Suen CY (1993) Cursive script recognition: a fast reader scheme. In: Proceedings of the 3rd international conference on documents analysis and recognition, pp 311–314
Hamamura T, Akagi T, Irie B (2007) An analytic word recognition algorithm using a posteriori probability. Proc Int Conf Doc Anal Recogn 02:669–673
Hanmandlu M, Murali KRM, Chakraborty S, Goyal S, Choudhury DR (2003) Unconstrained handwritten character recognition based on fuzzy logic. Pattern Recogn 36(3):603–623
Indira K, Selvi S (2007) An off line cursive script recognition system using Fourier-wavelet features. In: International conference on computational intelligence and multimedia applications, pp 506–511
Kavallieratou E, Fakotakis N, Kokkinakis G (1999) New algorithms for skewing correction and slant removal on word level. In: Proceedings of 6th IEEE international conference on electronics, circuits and systems, vol 2, pp 1159–1162
Kavallieratou E, Fakotakis N, Kokkinakis G (2000) A slant removal algorithm. Pattern Recogn 33(7):1261–1262
Kavallieratou E, Stamatatos E, Fakotakis N, Kokkinakis G (2000) Handwritten character segmentation using transformation-based learning. In: Proceedings of 15th international conference on pattern recognition, vol 2, pp 634–637
Kavallieratou E, Fakotakis N, Kokkinakis G (2001) Slant estimation algorithm for OCR system. Pattern Recogn 34(12):2515–2522
Kavallieratou E, Dromazou N, Fakotakis N, Kokkinakis G (2003) An integrated system for handwritten document image processing. Int J Pattern Recognit Artif Intell 17(4):617–636
Kapp MN, de Almendra Freitas C, Sabourin R (2007) Methodology for the design of NN-based month-word recognizers written on Brazilian bank checks. Image Vision Comput 25(1):40–49
Kim G, Govindaraju V, Srihari SN (1999) Architecture for handwritten text recognition systems. Adv Handwrit Recogn 163–182
Kim G, Govindaraju V (1997) A Lexicon Driven approach to handwritten word recognition for real time application. IEEE Trans Pattern Anal Mach Intell 19(4):366–379
Koerich AL, Ling LL (1998) a system for automatic extraction of the user-entered data from bank checks. In: Proceedings of international symposium on computer graphics, image processing and vision, pp 270–278
Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary offline handwriting recognition: a survey. Pattern Anal Appl 6(2):97–121
Koerich AL, Britto A, Oliveira LES, Sabourin R (2006) Fusing high- and low-level features for handwritten word recognition. In: Proceedings of the tenth international workshop on frontiers in handwriting recognition
Lee L, Coelho S (2005) A simple and efficient method for global handwritten word recognition applied to Brazilian bank checks. In: Proceedings of the 8th international conference on document analysis and recognition, pp 950–955
Liolios N, Fakotakis N, Kokkinakis G (2002) On the generalization of the form identification and skew detection problem. Pattern Recognition 35:253–264
Liu C-L, Fujisawa H (2005) Classification and learning for character recognition: comparison of methods and remaining problems. In: Proceedings of the international workshop on neural networks and learning in document analysis and recognition, pp 5–7
Madhvanath S, Kleinberg E, Govindaraju V (1999) Holistic verification of handwritten phrases. IEEE Trans Pattern Anal Mach Intell 21:1344–1356
Madhvanath S, Shrihari S (1996) A technique for local baseline determination. In: Proceedings of the 5th international workshop on frontiers in handwriting recognition, pp 445–448
Marinai S, Gori M, Soda G (2005) Artificial neural networks for document analysis and recognition. IEEE Trans Pattern Anal Mach Intell 27(1):23–35
Morita M, Facon J, Bortolozzi F, Garnes S, Sabourin R (1999) Mathematical morphology and weighted least squares to correct handwriting baseline skew. In: Proceedings of the international conference on document analysis and recognition, vol 1, Bangalore, pp 430–433
Neamah K, Mohamad, D, Saba T, Rehman A (2014) Discriminative features mining for offline handwritten signature verification. 3D Research 5(3). doi:10.1007/s13319-013-0002-3
Nicchiotti G, Scagliola C (1999) Generalised projections: a tool for cursive handwriting normalization. In: Proceedings of 5th international conference on document analysis and recognition, ICDAR’99, Bangalore India, pp 729–732
O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173
Okun O, Pietikainen M, Sauvola J (1999) Robust skew estimation on low-resolution document images. In: 5th International conference on document analysis and recognition, pp 621–624
Paster M, Toselli A, Vidal E (2004) Projection profile based algorithm for slant removal. In: Proceedings of the international conference on image analysis and recognition, pp 183–190
Rehman A, Dzulkifli M, Kurniawan F (2008) Line and skew removal from off-line cursive handwritten words. Int J Res (Sci) 24(2):28–33
Rehman A, Alqahtani S, Altameem A, Saba T (2013) Virtual machine security challenges: case studies. Int J Mach Learn Cybernet. doi:10.1007/s13042-013-0166-4
Rehman A, Saba T (2012) Evaluation of artificial intelligent techniques to secure information in enterprises. Artif Intell Rev. doi:10.1007/s10462-012-9372-9
Rehman A, Saba T (2012) Neural network for document image preprocessing. Artif Intell Rev. doi:10.1007/s10462-012-9337-z
Rehman A, Saba T (2011) Document skew estimation and correction: analysis of techniques, common problems and possible solutions. Appl Artif Intell 25(9):769–787
Rehman A, Kurniawan F, Saba T (2011) An automatic approach for line detection and removal without characters smash-up. Imaging Science Journal 59(3):171–182
Rehman A, Mohamad D, Sulong G, Saba T (2009) Simple and effective techniques for core zone detection and slant correction in script recognition. In: The IEEE international conference on signal and image processing applications (ICSIPA’09), pp 15–20
Saba T (2012) Offline cursive touched script recognition. PhD Thesis, submitted to Universiti Teknologi Malaysia, pp 73–80
Saba T, Alzorani S, Rehman A (2012) Expert system for offline clinical guidelines and treatment. Life Sci J 9(4):2639–2658
Saba T, Rehman A (2012) Effects of artificially intelligent tools on pattern recognition. Int J Mach Learn Cybernet 4:155–162. doi:10.1007/s13042-012-0082-z
Saba T, Rehman A (2012) Machine learning and script recognition. Lambert Academic Publisher, ISBN-10: 3659111708, pp 78–91
Saba T, Rehman A, Elarbi-Boudihir M (2011) Methods and strategies on off-line cursive touched characters segmentation: a directional review. Artif Intell Rev. doi:10.1007/s10462-011-9271-5
Saba T, Rehman A, Sulong G (2011) Improved statistical features for cursive character recognition. Int J Innov Comput Inf Control (IJICIC) 7(9):5211–5224
Saba T, Rehman A, Sulong G (2011) Cursive Script Segmentation with Neural Confidence. Int J Innov Comput Inf Control (IJICIC) 7(8):4955–4964
Sarfraz M, Mahmoud SA, Rasheed Z (2007) On skew estimation and correction of text. In: Proceedings of international conference on computer graphics, imaging and visualization, pp 308–313
Sarfraz M, Zidouri A, Shahab SA (2005) A novel approach for skew estimation of document images in OCR system. In: Proceedings of IEEE conference on computer graphics, imaging and vision: new trends (CGIV’05), pp 175–180
Senior AW (1994) Offline cursive handwriting recognition using recurrent neural networks. PhD Dissertation, University of Cambridge, England
Senior AW, Robinson AJ (1998) An offline cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321
Senior W, Robinson AJ (2002) An offline cursive handwriting recognition system. IEEE Trans Pattern Anal Mach Intell 20(3):309–321
Suen CY, Tan J (2005) Analysis of errors of handwritten digits made by a multitude of classifiers. Pattern Recogn Lett 26(3):369–379
Taira E, Uchida S, Sakoe H (2004) Non-uniform slant correction for handwritten word recognition. IEICE Trans Inf Syst E87-D(5):1247–1253
Uchida S, Taira E, Sakoe H (2001) Non-uniform slant correction using dynamic programming. In: Proceedings of 6th international conference on document analysis and recognition, vol 1, pp 434–438
Verma B (2002) A contour character extraction approach in conjunction with a neural confidence fusion technique for the segmentation of handwriting recognition. In: Proceeding of the 9th international conference on neural information processing, vol 5, pp 2459–2463
Vinciarelli A, Luettin J (2001) A new normalization technique for cursive handwritten words. Pattern Recogn Lett 22:1043–1050
Watanabe MM, Hamammoto Y, Yasuda T, Tomita S (1997) Normalization techniques of handwritten numerals for Gabor filters. In: Proceedings of the international conference on document analysis and recognition, ICDAR IEEE, Los Alamitos, CA, vol 1, pp 303–307
Wang L, Wang X, Feng J (2006) On image matrix based feature extraction algorithms. IEEE Trans Syst Man Cybernet Cybernet 36(1):194–197
Yoo J-Y, Kim M-K, Han SY, Kwon Y-B (1997) Line removal and restoration of handwritten characters on the form documents. In: Proceedings of the fourth international conference on document analysis and recognition, vol 1, pp 128–131
Yu B, Jain AK (1996) A generic system for form dropout. IEEE Trans Pattern Anal Mach Intell 18(11):1127–1132
Zeeuw FD (2006) Slant correction using histogram. Bachelor thesis, pp 3–4
Zheng Y, Li Y, Doermann D (2006) Detecting text lines in handwritten documents. In: Proceedings of 18th international conference on pattern recognition, vol 2, pp 1030–1033
Acknowledgments
Our deepest thanks and appreciation to the Deanship of Scientific Research at King Saud University (KSU) Riyadh Kingdom of Saudi Arabia for funding this research. We are also thankful to our colleague researchers for their assistance in neural network training and testing phases.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saba, T., Rehman, A., Altameem, A. et al. Annotated comparisons of proposed preprocessing techniques for script recognition. Neural Comput & Applic 25, 1337–1347 (2014). https://doi.org/10.1007/s00521-014-1618-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1618-9