Abstract
One of the steps of character recognition systems is the segmentation of words/sub-words into characters. The segmentation of text written in any Arabic script is a most difficult task. Due to this difficulty, many systems consider sub-words instead of a character as the basic unit for recognition. We propose a method for the segmentation of printed Arabic words/sub-words into characters. In the proposed method, primary and secondary strokes of the sub-words are separated and then segmentation points are identified in the primary strokes. For this, we compute the vertical projection graph for each line, which is then processed to generate a string indicating relative variations in pixels. The string is scanned further to produce characters from the sub-words. In the proposed method we use Sindhi text for segmentation into characters as its character set is the super set of Arabic. This method can be used for any other Naskh-based Arabic script such as Persian, Pashto and Urdu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdullah, I., Al-Shoshan: Arabic OCR Based on Image Invariants. In: Proceedings of the Geometric Modeling and Imaging – New Trends, pp. 150–154 (2006)
Nouh, A., Sultan, A., Tolba, R.: An approach for Arabic characters Recognition. J. Engng Sci. Univ. Riyadh. 6, 185–191 (1980)
Badr, B.A., Mahmoud, S.A.: A survey and bibliography of Arabic optical text recognition. Signal Processing 41, 49–76 (1995)
Zheng, L.: Machine Printed Arabic Character Recognition Using S-GCM. In: Proceedings of 18th International Conference on Pattern Recognition, vol. 2, pp. 893–896 (2006)
Mandana, K., Amin, A.: Pre-processing and Structural Feature Extraction for a Multi-Fonts Arabic/Persian OCR. In: Proceedings of 5th Intl. Conference on Document Analysis and Recognition (1999)
Somaya, A.: Recognition of Off-Line Handwritten Arabic Words Using Neural Network. In: Proceedings of the Geometric Modeling and Imaging – New Trends, pp. 141–144 (2006)
Elgammal, A., Ismail, M.A.: A graph-based segmentation and feature extraction framework for Arabic text recognition. In: Proceedings of 6th Intl. Conference on Document Analysis and Recognition, pp. 622–627 (2001)
Pakker, K.R., Miled, H., Lecourtier, Y.: A new approach for Latin/Arabic character segmentation. In: Proceedings of 3rd Intl. Conference on Document Analysis and Recognition, vol. 2, pp. 874–878 (1995)
Najoua, B.A., Noureddine, E.: A robust approach for Arabic printed character segmentation. In: Proceedings of 3rd Intl. Conference on Document Analysis and Recognition, vol. 2, pp. 865–868 (1995)
Motawa, D., Amin, A., Sabourin, R.: Segmentation of Arabic cursive script. In: Proceedings of the 4th International conference on Document Analysis and Recognition, pp. 625–628 (1997)
Sarfraz, M., Nawaz, S.N., Al-Khuraidly, A.: Offline Arabic text recognition system. In: Proceedings of the Int. Conference on Geometric Modeling and Graphics, pp. 30–35 (2003)
Ymin, A., Aoki, Y.: On the segmentation of multi-font printed Uygur scripts. In: Proceedings of Intl. Conference on Pattern Recognition, vol. 3, pp. 215–220 (1996)
El-Khaly, F., Sid-Ahmed, M.A.: Machine recognition of optically captured machine printed Arabic text. Proceedings of Pattern Recognition 23, 1207–1214 (1990)
Margner, V.: SARAT - A system for the recognition of Arabic printed text. In: Proceedings of 11th Intl. Conference on Pattern Recognition, pp. 561–564 (1992)
Sari, T., Souici, L., Sellami, M.: Off-line handwritten Arabic character segmentation algorithm: ACSA. In: Proceedings of Intl. Workshop on Frontiers of Handwriting Recognition, pp. 452–456 (2002)
Mehran, R., Pirsiavash, H., Razzazi, F.: A Front-end OCR for Omni-font Persian/Arabic Cursive Printed Documents. In: Proceedings of the Digital Imaging Computing: Techniques and Applications, pp. 56–60 (2005)
Wang, Z., Lu, Y., Tan, C.L.: Word extraction using area Voronoi diagram. In: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop, vol. 3, pp. 31–36 (2003)
Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital image processing using MatLab. 3rd Indian Reprint (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shaikh, N.A., Shaikh, Z.A., Ali, G. (2008). Segmentation of Arabic Text into Characters for Recognition. In: Hussain, D.M.A., Rajput, A.Q.K., Chowdhry, B.S., Gee, Q. (eds) Wireless Networks, Information Processing and Systems. IMTIC 2008. Communications in Computer and Information Science, vol 20. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89853-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-89853-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89852-8
Online ISBN: 978-3-540-89853-5
eBook Packages: Computer ScienceComputer Science (R0)