Abstract
Several supervised and unsupervised methods have been applied to the field of character recognition. In this research we focus on the unsupervised methods used to group similar characters together. Instead of using the traditional clustering algorithms, which are mainly restricted to globular-shaped clusters, we use an efficient distance based clustering that identifies the natural shapes of clusters according to their densities. Thus, in the case of character recognition, where it is natural to have different writing styles for the same character, the algorithm can be used to discover the continuity between character feature vectors, which cannot be discovered by traditional algorithms. This paper |introduces the use of an algorithm that efficiently finds arbitrary-shaped clusters of characters, and compares it to related algorithms. Two character recognition data sets are used to illustrate the efficiency of the suggested algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial data sets with noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231
Hinneburg, A., Keim, D.A.: An Effcient Approach to Clustering in Large Multimedia Databases with Noise. In: KDD 1998 (1998)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of Second SIAM International Conference on Data Mining, San Francisco, CA, USA (May 2003)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer 32(8), 68–75 (1999)
Yousri, N.A., Ismail, M.A., Kamel, M.S.: Discovering Connected Patterns in Gene Expression Arrays. In: IEEE CIBCB 2007 (2007)
Hartigan, J.A.: Clustering Algorithms. Wiley Series in Probability and Mathematical Statistics (1975)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: An introduction to cluster analysis. John Wiley, New York (1990)
Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining, In: Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, pp. 144–155 (1994)
Zhang, T., Ramakrishnan, R., Linvy, M.: BIRCH: An efficient data clustering method for very large data sets. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Ng, A.Y., Jordan, M., Weiss, Y.: On spectral clustering:Analysis and an algorithm. In: Proc. of NIPS-14 (2001)
Dhillon, I.S., Guan, Y., Yulis, B.: Kernel k-Means, Spectral Clustering and Normalized Cuts. In: KDD 2004 (2004)
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), pp. 73–84 (1998)
Becici, E., Yuret, D.: Locally Scaled Density Based Clustering. In: International Conference on Adaptive and Natural Computing Algorithms (ICANNGA), Poland (2007)
Dong, J., krzyzak, A., Suen, C.: An improved handwritten Chinese character recognition system using support vector machine. Pattern Recognition Letters 26(12) (September 2005)
Alimoglu, F., Alpaydin, E.: Methods of Combining Multiple Classifiers Based on Different Representations for Pen-based Handwriting Recognition. In: Proceedings of the Fifth Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN 1996), Istanbul, Turkey (1996)
Alpaydin, E., Kaynak, C., Alimoglu, F.: Cascading Multiple Classifiers and Representations for Optical and Pen-Based Handwritten Digit Recognition, IWFHR, Amsterdam, The Netherlands (September 2000)
Vuori, V., Laaksonen, J.: A Comparison of Techniques for Automatic Clustering of Handwritten Characters. In: ICPR 2002 (2002)
Yousri, N.A., Kamel, M.S., Ismail, M.A.: Pattern Cores and Connectedness in Cancer Gene Expression. In: IEEE BIBE 2007 (2007)
http://archive.ics.uci.edu/ml/support/Pen-Based+Recognition+of+Handwritten+Digits http://archive.ics.uci.edu/ml/support/Pen-Based+Recognition+of+Handwritten+Digits
Hammouda, K., Kamel, M.S.: Efficient Phrase-Based Document Indexing for Web Document Clustering. IEEE transactions on Knowledge and Data Engineering (TKDE) 16(10) (October 2004)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification, 193–218 (1985), http://faculty.washington.edu/kayee/pca/supp.pdf
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster Validity Methods; Part 1. SIGMOD record 31(2) (June 2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yousri, N.A., Kamel, M.S., Ismail, M.A. (2008). Finding Arbitrary Shaped Clusters for Character Recognition. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2008. Lecture Notes in Computer Science, vol 5112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69812-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-540-69812-8_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69811-1
Online ISBN: 978-3-540-69812-8
eBook Packages: Computer ScienceComputer Science (R0)