Abstract
Automatic speech recognition mainly relies on hidden Markov models (HMM) which make little use of phonetic knowledge. As an alternative, landmark based recognizers rely mainly on precise phonetic knowledge and exploit distinctive features. We propose a theoretical framework to combine both approaches by introducing phonetic knowledge in a non stationary HMM decoder. To demonstrate the potential of the method, we investigate how broad phonetic landmarks can be used to improve a HMM decoder by focusing the best path search. We show that, assuming error free landmark detection, every broad phonetic class brings a small improvement. The use of all the classes reduces the error rate from 22 % to 14 % on a broadcast news transcription task. We also experimentally validate that landmarks boundaries does not need to be detected precisely and that the algorithm is robust to non detection errors.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Liu, S.A.: Landmark detection for distinctive feature-based speech recognition. PhD thesis, Massachusetts Institute of Technology (1995)
Juneja, A.: Speech recognition based on phonetic features and acoustic landmarks. PhD thesis, University of Maryland (2004)
John Hopkins University, Center for Language and Speech Processing: Landmark-based speech recognition: report of the 2004 John Hopkins Summer Workshop, John Hopkins University, Center for Language and Speech Processing (2005)
McDermott, E., Hazen, T.: Minimum classification error training of landmark models for real-time continuous speech recognition. In: Proc. IEEE Intl. Conf. Acoust. Speech, Signal Processing, vol. 1 (2004)
Schutte, K., J.G.: Robust detection of sonorant landmarks. In: European Conf. on Speech Communication and Technology – Interspeech (2005)
Chen, M.: Nasal landmark detection. In: Intl. Conf. Speech and Language Processing (2000)
Howitt, A.: Vowel landmark detection. In: Intl. Conf. Speech and Language Processing (2000)
Li, J., Lee, C.H.: On designing and evaluating speech event detectors. In: European Conference on Speech Communication and Technology – Interspeech (2006)
Galliano, S., Geoffrois, E., Bonastre, J.F., Gravier, G., Mostefa, D., Choukri, K.: Corpus description of the Ester evaluation campaign for the rich transcription of french broadcast news. In: Language Resources and Evaluation Conference (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gravier, G., Moraru, D. (2007). Towards Phonetically-Driven Hidden Markov Models: Can We Incorporate Phonetic Landmarks in HMM-Based ASR?. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-77347-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)