Skeletonization of Arabic characters using clustering based skeletonization algorithm (CBSA)

doi:10.1016/0031-3203(91)90058-D

Pattern Recognition

Volume 24, Issue 5, 1991, Pages 453-464

https://doi.org/10.1016/0031-3203(91)90058-D Get rights and content

Abstract

Character skeletonization is an essential step in many character recognition techniques. In this paper, skeletonization of Arabic characters is addressed. While other techniques employ thinning algorithms, in this paper clustering of Arabic characters is used. The use of clustering technique (an expensive step) is justified by the properties of the generated skeleton which has the advantages of other thinning techniques and is robust. The presented technique may be used in the modeling and training stages to reduce the processing time of the recognition system.

References (21)

B. Parhami et al.
Automatic recognition of printed Farsi texts
Pattern Recognition
(1981)
K.J. Udupa et al.
Some new concepts for encoding line patterns
Pattern Recognition
(1975)
L. Zadeh
Fuzzy sets
Inf. Control
(1965)
A. Rajavelu et al.
A neural network approach to character recognition
Neural Networks
(1989)
Pepi Siy et al.
Fuzzy logic for handwritten numeral character recognition
IEEE Trans. Syst. Man Cybern.
(1974)
W.J.M. Kickert et al.
Application of fuzzy set theory to syntactic pattern recognition of handwritten capitals
IEEE Trans. Syst. Man Cybern.
(1976)
Ching Y. Suen et al.
Automatic recognition of handprinted characters—the state of the art
G. Wolberg
A syntactic omni-font character recognition
M. Beun
A flexible method for automatic reading of handwritten numerals
Phillips tech. Rev.
(1973)
H. Almuallim et al.
A method of recognition of Arabic cursive handwriting
IEEE Trans. Pattern Anal. Mach. Intell.
(1987)

There are more references available in the full text version of this article.

Cited by (33)

Manifold learning for the shape-based recognition of historical arabic documents
2013, Handbook of Statistics
Citation Excerpt :
Usually, a fraction of the information available is extracted and used as features. Skeleton images, in contrast, contain much less noisy information (Mahmoud et al., 1991; Steinherz et al., 2000). However, it is much more difficult to compare them at the image level, and so it is the high-level features that are usually extracted from them (Zhu, 2007).
In this work, a recognition approach applicable at the letter block (subword) level for Arabic manuscripts is introduced. The approach starts with the binary images of the letter block to build their input representation, which makes it highly objective and independent of the designer. Then, using two different manifold learning techniques, the representations are reduced and learned. In order to decrease the computational complexity, PCA is applied to the input representations before manifold learning is applied. Also, in order to increase the performance and quality of the input representations, a gray stroke map (GSM) is considered in addition to the binary images. The performance of the approach is tested against a database from a historical Arabic manuscript with promising results.
High-performance Arabic character recognition
1998, Journal of Systems and Software
Many Arabic character recognition systems have been proposed since the early eighties. Most systems reported high recognition rates, however, they overlooked a very important factor in the process; the speed factor. In this paper, a high-performance Arabic character recognition system is introduced. The goal of the system is to maximize both accuracy and speed. The goal has been achieved through developing a high-accuracy sequential Arabic character recognition system and, then mapping it into a multi-processing environment. Experimental results show that the multi-processing environment is very promising in enhancing a sequential Arabic character recognition system performance.
Appendix: Digital topology - A brief introduction and bibliography
1996, Machine Intelligence and Pattern Recognition
In image processing and computer graphics, an object in the plane or 3-space is approximated digitally by a set of pixels or voxels. Digital topology studies the properties of this set of pixels or voxels that correspond to topological properties of the original object. It provides theoretical foundations for important operations such as digitization, connected component labeling and counting, boundary extraction, contour filling, and thinning. Some basic terminology and fundamental concepts of digital topology are discussed and important areas of the field are described in the chapter.
A neural network based dedicated thinning method
1995, Pattern Recognition Letters
Survey and bibliography of Arabic optical text recognition
1995, Signal Processing
Research work on Arabic optical text recognition (AOTR), although lagging that of other languages, is becoming more intensive than before and commercial systems for AOTR are becoming available. This paper presents a comprehensive survey and bibliography of research on AOTR, by covering all the research publications on AOTR to which the authors had access. This paper introduces the general topic of optical character recognition (OCR), and highlights the characteristics of Arabic text. It also presents an historical review of the Arabic text recognition systems. Further, this paper reports on the state of the art in AOTR research, and lists the specifications of commercially available systems for AOTR. In this paper, we first underline the capabilities of different AOTR systems, and then introduce a five stage model for AOTR systems and classify research work according to this model. We devote a section to each of the stages of this model: preprocessing, segmentation, feature extraction, classification, and post-processing. In the preprocessing section, we emphasize handling degraded documents, and thinning of Arabic text. In the segmentation section, we discuss methods of segmenting Arabic text and categorize the methods into five general approaches. In the feature extraction and classification sections, we highlight the main techniques and analyze AOTR research works based on those techniques. We then discuss approaches for post-processing and show their relation to the Arabic language. We conclude by pointing problems and directions for future research on AOTR.
Forschungsarbeiten über die optische Erkennung arabischer Texte (AOTR) werden mit zunehmender Intensität betrieben, obwohl sie gegenüber anderen Sprachen etwas verzögert sind. Kommerzielle Systeme für AOTR sind schon erhältlich. Diese Arbeit gibt eine umfassende Übersicht und Bibliographie zur Forschung über AOTR, die alle Forschungsveröffentlichungen einschlieβt, zu denen die Autoren Zugang hatten. Diese Arbeit führt in die allgemeine Thematik der optischen Zeichenerkennung (OCR) ein und hebt die Besonderheiten arabischer Texte hervor. Sie gibt auβerdem einen geschichtlichen Überblick über Erkennungssysteme für arabische Texte. Zusätzlich wird über den letzten Stand der AOTR-Forschungen berichtet, und es werden die Spezifikationen kommerziell erhältlicher AOTR-Systeme angeführt. Zuerst unterstreichen wir die Mőglichkeiten der verschiedenen AOTR-Systeme, danach stellen wir ein Fünf-Stufen-Modell für AOTR-Systeme vor und klassifizieren die Forschungsarbeiten anhand dieses Modells. Wir widmen jeder dieser Modellstufen einen Abschnitt: Vorverarbeitung, Segmentierung, Merkmalsextraktion, Klassifikation und Nachverarbeitung. Im Abschnitt über Vorverarbeitung heben wir die Behandlung beschädigter Dokumente hervor und die Ausdünnung arabischer Texte. Danach diskutieren wir Methoden zur Segmentierung arabischer Texte and unterteilen die Methoden in fünf Ansätze. In den Abschnitten über Merkmalsextraktion und Klassifikation heben wir die wichtigsten Techniken hervor und analysieren die AOTR-Arbeiten in bezug auf diese Techniken. Danach diskutieren wir Ansätze für die Nachverarbeitung und zeigen ihre Beziehung zur arabischen Sprache. Wir schlieβen die Arbeit mit Hinweisen auf Probleme und auf zukünftige Forschungsarbeiten in AOTR.
Le travail de recherche sur la Reconnaissance Optique de Textes Arabes (ROTA), bien que moins avancé que pour d'autres langues, devient plus intensif qu'avant, et des systèmes commerciaux de ROTA deviennent disponibles. Cet article présente un apercu et une bibliographie de la recherche sur le ROTA, couvrant toutes les publications sur le sujet auxquelles les auteurs ont eu accès. Cet article introduit le sujet plus géneral de la Reconnaissance Optique de Caractères (ROC), et met l'accent sur les caractéristiques du texte arabe. Il présente également un résumé historique des systèmes de reconnaissance des textes arabes. Plus loin, ce texte fait un “état des lieux” de la recherche sur la ROTA, et énumère les specifications des systèmes disponibles commercialement. Dans cet article, nous soulignons d'abord les capacités des différents systèmes de ROTA, puis introduisons un modèle à 5 niveaux pour ces systèmes, et classons le travail de recherche d'après ce modèle. Nous consacrons une section à chacun des étages de ce modèle: prétraitement, segmentation, extraction de caractéristiques, classification et post-traitement. Dans la section consacrée au pré-traitement, nous accentuons le traitement du texte arabe dégradé, et l'amincissement du même texte. Dans la section de segmentation, nous discutons les méthodes de segmentation des textes arabes et catégorisons les méthodes selon 5 approaches générales. Dans les sections d'extraction de caractéristiques et de classification, nous soulignons les techniques principales et analysons les travaux de ROTA basés sur les dites techniques. Nous discutons ensuite des approaches pour le post-traitement et montrons leurs relations avec la langue arabe. Nous concluerons en indiquant certains problèmes et certaines directions pour la recherche future en ROTA.
An overview of character recognition focused on off-line handwriting
2001, IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews

View all citing articles on Scopus

View full text

Skeletonization of Arabic characters using clustering based skeletonization algorithm (CBSA)

Abstract

Pattern Recognition

Pattern Recognition

Inf. Control

Neural Networks

Fuzzy logic for handwritten numeral character recognition

IEEE Trans. Syst. Man Cybern.

Application of fuzzy set theory to syntactic pattern recognition of handwritten capitals

IEEE Trans. Syst. Man Cybern.

Automatic recognition of handprinted characters—the state of the art

A syntactic omni-font character recognition

A flexible method for automatic reading of handwritten numerals

Phillips tech. Rev.

A method of recognition of Arabic cursive handwriting

IEEE Trans. Pattern Anal. Mach. Intell.