Abstract
In this paper we present a foundational basis for optimal and information theoretic, syntactic Pattern Recognition (PR) for syntactic patterns which can be “linearly” represented as strings. In an earlier paper Oommen and Kashyap [25] we had presented a formal basis for designing such systems when the errors involved were arbitrarily distributed Substitution, Insertion and Deletion (SID) syntactic errors. In this paper we generalize the framework and permit these traditional errors and Generalized Transposition (GT) errors. We do this by developing a rigorous model, MG*, for channels which permit all these errors in an arbitrarily distributed manner. The scheme is Functionally Complete and stochastically consistent. Besides the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[Y¦U], the probability of receiving Y given that U was transmitted, can be computed in quartic time using dynamic programming. Experimental results which involve dictionaries with strings of lengths between 7 and 14 with an overall average noise of 70.5% demonstrate the superiority of our system over existing methods.
Partially supported by the Natural Sciences and Engineering Research Council of Canada.
Preview
Unable to display preview. Download preview PDF.
Abridged list of references
R. L. Bahl and F. Jelinek, Decoding with channels with insertions, deletions and substitutions with applications to speech recognition, IEEE T Inf. Th., IT-21:404–411 (1975).
Bunke, H. and Csirik, J, Parametric string edit distance and its application to pattern Recognition, IEEE T. Syst, Man and Cybern., SMC-25:202–206 (1993).
L. Devroye, Non-Uniform Random Variate Generation, Springer-Verlag, (1986).
G. Dewey, Relative Frequency of English Speech Sounds, Harvard Univ. Press, (1923).
R. O. Duda, P.E. Hart. Pattern Classification and Scene Analysis. Wiley & Sons, 1973.
G.D. Forney, The Viterbi Algorithm, Proceedings of the IEEE, Vol. 61. (1973).
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1972.
P. A. V. Hall and G.R. Dowling, Approximate string matching, Comp. Sur., 12:381–402 (1980).
R. L. Kashyap and B. J. Oommen, A common basis for similarity and dissimilarity measures involving two strings, Internat. J. Comput. Math., 13:17–40 (1983).
R. L. Kashyap and B. J. Oommen, An effective algorithm for string correction using generalized edit distances-I. Description of the algorithm and its optimality, Inf. Sci., 23(2): 123–142 (1981).
R. L. Kashyap, and B. J. Oommen, String correction using probabilistic methods, Pattern Recognition Letters, 147–154 (1984).
R. Lowrance and R. A. Wagner, An extension of the string to string correction problem, J. Assoc. Comput. Mach., 22:177–183 (1975).
A. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Phys. Dokl., 10:707–710 (1966).
W. J. Masek and M. S. Paterson, A faster algorithm computing string edit distances, J. Comput. System Sci., 20:18–31 (1980).
D. L. Neuhoff, The Viterbi algorithm as an aid in text recognition, IEEE T. Inf. Th., 222–226 (1975).
T. Okuda, E. Tanaka, and T. Kasai, A method of correction of garbled words based on the Levenshtein metric, IEEE T. Comput., C-25:172–177 (1976).
Oommen, B.J. and Loke, R. K. S., “Pattern Recognition of Strings Containing Traditional and Generalized Transposition Errors”, Proceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics, Vancouver, October 1995, pp. 1154–1159.
B. J. Oommen and R. Loke, Information Theoretic Syntactic Pattern Recognition Involving Traditional and Transposition Errors, Unabridged version of the present paper.
D. Sankoff and J. B. Kruskal, Time Warps,String Edits and Macromolecules: The Theory and practice of Sequence Comparison, Addison-Wesley (1983).
R. Shinghal, and G. T. Toussaint, Experiments in text recognition with the modified Viterbi algorithm, IEEE T. on Pat. An. and M. Intel., 184–192 (1979).
S. Srihari, Computer Text Recognition and Error Correction, IEEE Computer Press, (1984).
A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimal decoding algorithm, IEEE T. on Information Theory, 260–26 (1967).
R. A. Wagner and M. J. Fisher, The string to string correction problem, J. Assoc. Comput. Mach., 21:168–173 (1974).
K. S. Fu, Syntactic Methods in Pattern Recognition, Academic Press, New York, 1974.
Oommen, B.J. and Kashyap, R. L., “Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors”. To appear in the Proceedings of SSPR-96, the 1996 International Symposium on Syntactic and Structural Pattern Recognition, Leipzig, Germany, August 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oommen, B.J., Loke, R.K.S. (1996). Optimal and information theoretic syntactic Pattern Recognition involving traditional and transposition errors. In: Chandru, V., Vinay, V. (eds) Foundations of Software Technology and Theoretical Computer Science. FSTTCS 1996. Lecture Notes in Computer Science, vol 1180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62034-6_52
Download citation
DOI: https://doi.org/10.1007/3-540-62034-6_52
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62034-1
Online ISBN: 978-3-540-49631-1
eBook Packages: Springer Book Archive