Abstract
This paper addresses the problem of structural clustering of string patterns. Adopting the grammar formalism for representing both individual sequences and sets of patterns, a partitional clustering algorithm is proposed. The performance of the new algorithm, taking as reference the corresponding hierarchical version, is analyzed in terms of computational complexity and data partitioning results. The new algorithm introduces great improvements in terms of computational efficiency, as demonstrated by theoretical analysis. Unlike the hierarchical approach, clustering results are dependent on the order of patterns’ presentation, which may lead to performance degradation. This effect, however, is overcome by adopting a resampling technique. Empirical evaluation of the methods is performed through application examples, by matching clusters between pairs of partitions and determining an index of clusters agreement.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
H. Bunke. String matching for structural pattern recognition. In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural Pattern Recognition, Theory and Applications, pages 119–144. World Scientific, 1990.
H. Bunke. Recent advances in string matching. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition, pages 107–116. World Scientific, 1992.
G. Cortelazzo, D. Deretta, G. A. Mian, and P. Zamperoni. Normalized weighted levensthein distance and triangle inequality in the context of similarity discrimination of bilevel images. Pattern Recognition Letters, 17:431–436, 1996.
A. L. Fred. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentence, pages 107–116. Springer-Verlag, 1996.
A. L. Fred and J. Leitão. A minimum code length technique for clustering of syntactic patterns. In Proc. Of the 13th I APR Int’l Conference on Pattern Recognition, Vienna, August 1996.
A. L. Fred and J. Leitão. Solomonoff coding as a means of introducing prior information in syntactic pattern recognition. In Proc. Of the 12th IAPR Int’l Conference on Pattern Recognition, pages 14–18, 1994.
A. L. Fred and J. Leitão. A comparative study of string dissimilarity measures in structural clustering. In S. Singh, editor, International Conference on Advances in Pattern Recognition, pages 385–384. Springer, 1998.
K. S. Fu. Syntactic pattern recognition. In Handbook of Pattern Recognition and Image Processing, pages 85–117. Academic Press, 1986.
K. S. Fu and S. Y. Lu. A clustering procedure for syntactic patterns. IEEE Trans. Systems Man Cybernetics, 7(7):537–541, 1977.
K. S. Fu and S. Y. Lu. Grammatical inference: Introduction and survey-part i and ii. IEEE Trans. Pattern Anal. and Machine Intelligence, 8(5):343–359, 1986.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
S. Y. Lu and K. S. Fu. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems Man Cybernetics, 8(5):381–389, 1978.
A. Marzal and E. Vidal. Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. and Machine Intelligence, 2(15):926–932, 1993.
L. Miclet. Grammatical inference. In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural Pattern Recognition-Theory and Applications, pages 237–290. Scientific Publishing, 1990.
B. J. Oomen and R. S. K. Loke. Pattern recognition of strings containing traditional and generalized transposition errors. In Int. Conf. on Systems, Men and Cybernetics, pages 1154–1159, 1995.
E. S. Ristad and P. N. Yianilos. Learning string-edit distance. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(5):522–531, May 1998.
R. J. Solomonoff. A formal theory of inductive inference (part i and ii). Information and Control, 7:1–22, 224–254, 1964.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fred, A.L.N., Leitão, J.M.N. (2000). Partitional vs Hierarchical Clustering Using a Minimum Grammar Complexity Approach. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds) Advances in Pattern Recognition. SSPR /SPR 2000. Lecture Notes in Computer Science, vol 1876. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44522-6_20
Download citation
DOI: https://doi.org/10.1007/3-540-44522-6_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67946-2
Online ISBN: 978-3-540-44522-7
eBook Packages: Springer Book Archive