Abstract
The wavelet transform has been used for feature extraction in many applications of pattern recognition. However, in general the learning algorithms are not designed taking into account the properties of the features obtained with discrete wavelet transform. In this work we propose a Markovian model to classify sequences of frames in the wavelet domain. The architecture is a composite of an external hidden Markov model in which the observation probabilities are provided by a set of hidden Markov trees. Training algorithms are developed for the composite model using the expectation-maximization framework. We also evaluate a novel delay-invariant representation to improve wavelet feature extraction for classification tasks. The proposed methods can be easily extended to model sequences of images. Here we present phoneme recognition experiments with TIMIT speech corpus. The robustness of the proposed architecture and learning method was tested by reducing the amount of training data to a few patterns. Recognition rates were better than those of hidden Markov models with observation densities based in Gaussian mixtures.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Crouse, M., Nowak, R., Baraniuk, R.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Transactions on Signal Processing 46(4), 886–902 (1998)
Mallat, S.: A Wavelet Tour of signal Processing, 2nd edn. Academic Press, London (1999)
Fan, G., Xia, X.G.: Improved hidden Markov models in the wavelet-domain. IEEE Transactions on Signal Processing 49(1), 115–120 (2001)
Durand, J.B., Gonçalvès, P., Guédon, Y.: Computational methods for hidden Markov trees. IEEE Transactions on Signal Processing 52(9), 2551–2560 (2004)
Sebe, N., Cohen, I., Garg, A., Huang, T.: Machine Learning in Computer Vision. Springer, Heidelberg (2005)
Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, Masachussets (2001)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge, Masachussets (1999)
Kim, S., Smyth, P.: Segmental Hidden Markov Models with Random Effects for Waveform Modeling. Journal of Machine Learning Research 7, 945–969 (2006)
Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice-Hall, New Jersey (1993)
Bengio, Y.: Markovian Models for Sequential Data. Neural Computing Surveys 2, 129–162 (1999)
Fine, S., Singer, Y., Tishby, N.: The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning 32(1), 41–62 (1998)
Murphy, K., Paskin, M.: Linear time inference in hierarchical HMMs. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, vol. 14, MIT Press, Cambridge (2002)
Willsky, A.: Multiresolution Markov models for signal and image processing. Proceedings of the IEEE 90(8), 1396–1458 (2002)
Dasgupta, N., Runkle, P., Couchman, L., Carin, L.: Dual hidden Markov model for characterizing wavelet coefficients from multi-aspect scattering data. Signal Processing 81(6), 1303–1316 (2001)
Lu, J., Carin, L.: HMM-based multiresolution image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing 4, 3357–3360 (2002)
Bengio, S., Bourlard, H., Weber, K.: An EM algorithm for HMMs with emission distributions represented by HMMs. Technical Report IDIAP-RR 11, Martigny, Switzerland (2000)
Weber, K., Ikbal, S., Bengio, S., Bourlard, H.: Robust speech recognition and feature extraction using HMM2. Computer Speech & Language 17(2-3), 195–211 (2003)
Bharadwaj, P., Carin, L.: Infrared-image classification using hidden Markov trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(10), 1394–1398 (2002)
Ichir, M., Mohammad-Djafari, A.: Hidden Markov models for wavelet-based blind source separation. IEEE Transactions on Image Processing 15(7), 1887–1899 (2006)
Zue, V., Sneff, S., Glass, J.: Speech database development: TIMIT and beyond. Speech Communication 9(4), 351–356 (1990)
Stevens, K.: Acoustic phonetics. MIT Press, Cambridge (1998)
Mallat, S.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674–693 (1989)
Daubechies, I.: Ten Lectures on Wavelets. In: Number 61 in CBMS-NSF Series in Applied Mathematics, SIAM, Philadelphia (1992)
Evangelista, G.: Pitch-synchronous wavelet representations of speech and music signals. IEEE Transactions on Signal Processing 41(12), 3313–3330 (1993)
Chan, C.P., Ching, P.C., Leea, T.: Noisy speech recognition using de-noised multiresolution analysis acoustic features. J. Acoust. Soc. Am. 110(5), 2567–2574 (2001)
Farooq, O., Datta, S.: Mel Filter-Like Admissible Wavelet Packet Structure for Speech Recognition. IEEE Signal Processing Letters 8(7) (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Milone, D.H., Di Persia, L.E. (2007). An EM Algorithm to Learn Sequences in the Wavelet Domain. In: Gelbukh, A., Kuri Morales, Á.F. (eds) MICAI 2007: Advances in Artificial Intelligence. MICAI 2007. Lecture Notes in Computer Science(), vol 4827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76631-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-76631-5_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76630-8
Online ISBN: 978-3-540-76631-5
eBook Packages: Computer ScienceComputer Science (R0)