Abstract
The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.
Similar content being viewed by others
References
Bellman, R.: Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957; 6th printing 1972.
Kangas, J.: Time-dependent self-organizing maps for speech recognition, In: T. Kohonen et al. (eds), Artificial Neural Networks, 2, Elsevier, Amsterdam, 1991, pp. 1591–1594.
Kangas, J.: On the analysis of pattern sequences by self-organizing maps, PhD thesis, Helsinki University of Technology, Finland, 1994.
Kohonen, T.: Self-Organizing Maps, Springer Series in Information Sciences, 30, Springer, Heidelberg, 1995; 2nd ed. 1997.
Kohonen, T.: Self-organizing maps of symbol strings, Report A42, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland, 1996.
Kohonen, T. and Somervuo, P.: Self-organizing maps of symbol strings with application to speech recognition, In: Proc. of Workshop on Self-Organizing Maps (WSOM'97), pp. 2–7, Espoo, Finland, 1997.
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals, Cybernetics and Control Theory, 10(8) (1966), 707–710.
McDermott, E. and Katagiri, S.: Prototype-based discriminative training for various speech units, In: Proc. of Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP'92), I-417–420, San Francisco, California, 1992.
Mäntysalo, J., Torkkola, K. and Kohonen, T.: Mapping context dependent acoustic information into context independent form by LVQ, Speech Communication, 14(2) (1994), 119–130.
Rabiner, L. and Wilpon, J.: Considerations in applying clustering techniques to speakerindependent word recognition, J. Acoust. Soc. Am., 66(3) (1979), 663–672.
Sakoe, H. and Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoustics, Speech, and Signal Processing, 26(1) (1978), 43–49.
Sankoff, D. and Kruskal, J.: Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983.
Torkkola, K. and Kokkonen, M.: Using the topology-preserving properties of SOFMs in speech recognition, In Proc. of Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP'91), pp. 261–264, Toronto, Canada, 1991.
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Inform. Theory, 13 (1967), 260–269.
White, G.: Dynamic programming, the Viterbi algorithm and low cost speech recognition, In Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP'78), pp. 413–417, Tulsa, Oklahoma, 1978.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Somervuo, P., Kohonen, T. Self-Organizing Maps and Learning Vector Quantization for Feature Sequences. Neural Processing Letters 10, 151–159 (1999). https://doi.org/10.1023/A:1018741720065
Issue Date:
DOI: https://doi.org/10.1023/A:1018741720065