Abstract
The hidden Markov model (HMM) is a state-of-the-art model for automatic speech recognition. However, even though it already showed good results on past experiments, it is known that the state conditional independence that arises from HMM does not hold for speech recognition. One way to partly alleviate this problem is by concatenating each observation with their adjacent neighbors. In this article, we look at a novel way to perform this concatenation by taking into account the frequency of the features. This approach was evaluated on spoken connected digits data and the results show an absolute increase in classification of 4.63% on average for the best model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, United States edition (1993)
Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. In: ISCSLP, pp. 301–305. IEEE (2012)
Zen, H., Tokuda, K., Kitamura, T.: Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech & Language 21(1), 153–173 (2007)
Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(6), 1850–1858 (2007)
Hanna, P., Ming, J., Smith, F.J.: Inter-frame dependence arising from preceding and succeeding frames - Application to speech recognition. Speech Communication 28(4), 301–312 (1999)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
Viroli, C.: Finite mixtures of matrix normal distributions for classifying three-way data. Statistics and Computing 21(4), 511–522 (2011)
Pearce, D., Hirsch, H.: Ericsson Eurolab Deutschland Gmbh: The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions. In: ISCA ITRW ASR 2000, pp. 29–32 (2000)
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge, UK (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Trottier, L., Chaib-draa, B., Giguère, P. (2014). Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-06483-3_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06482-6
Online ISBN: 978-3-319-06483-3
eBook Packages: Computer ScienceComputer Science (R0)