A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems

Dines, John; Magimai Doss, Mathew

doi:10.1007/978-3-540-78155-4_19

John Dines¹ &
Mathew Magimai Doss^1,2

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

1095 Accesses
4 Citations

Abstract

In this paper we present a study of automatic speech recognition systems using context-dependent phonemes and graphemes as sub-word units based on the conventional HMM/GMM system as well as tandem system. Experimental studies conducted on three different continuous speech recognition tasks show that systems using only context-dependent graphemes can yield competitive performance on small to medium vocabulary tasks when compared to a context-dependent phoneme-based automatic speech recognition system. In particular, we demonstrate the utility of tandem features that use an MLP trained to estimate phoneme posterior probabilities in improving grapheme based recognition system performance by implicitly incorporating phonemic knowledge into the system without having to define a phonetically transcribed lexicon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Grapheme-to-Phoneme Transduction for Cross-Language ASR

Investigation of Different G2P Schemes for Speech Recognition in Sanskrit

Heterophonic speech recognition using composite phones

Article Open access 24 November 2016

References

Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: Proceedings of Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 845–848 (2002)
Google Scholar
Killer, M., Stüker, S., Schultz, T.: Grapheme based speech recognition. In: Proceedings of Eurospeech, pp. 3141–3144 (2003)
Google Scholar
Magimai.-Doss, M., Stephenson, T.A., Bourlard, H., Bengio, S.: Phoneme-Grapheme based automatic speech recognition system. In: Proceedings of Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 94–98 (2003)
Google Scholar
Schukat-Talamazzini, E.G., Niemann, H., Eckert, W., Kuhn, T., Rieck, S.: Automatic speech recognition without phonemes. In: Eurospeech, pp. 129–132 (1993)
Google Scholar
Magimai.-Doss, M., Bengio, S., Bourlard, H.: Joint decoding for phoneme-grapheme continuous speech recognition. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. I–177–I–180 (2004)
Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) analysis of speech. Journal of Acoustical Society of America 87(4), 1738–1752 (1990)
Article Google Scholar
Hermansky, H., Ellis, D., Sharma, S.: Tandem connectionist feature stream extraction for conventional HMM systems. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. III–1635–1638 (2000)
Google Scholar
Cole, R.A., Fanty, M., Noel, M., Lander, T.: Telephone speech corpus development at CSLU. In: ICSLP 1994. Proceedings of Int. Conf. Spoken Language Processing (1994)
Google Scholar
Price, P.J., Fisher, W., Bernstein, J.: A database for continuous speech recognition in a 1000 word domain. In: ICASSP 1988. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 651–654 (1988)
Google Scholar
Chen, B., Çetin, Ö., Doddington, G., Morgan, N., Ostendorf, M., Shinozaki, T., Zhu, Q.: A CTS task for meaningful fast-turnaround experiments. In: Proceedings of Rich Transcription Fall Workshop, Palisades, NY (2004)
Google Scholar
Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, pp. 77–80 (1998)
Google Scholar
Odell, J.J.: The use of context in large vocabulary continuous speech recognition. PhD thesis, Queens College, University of Cambridge (1995)
Google Scholar
Ciprian, C., Morton, R.: Mutual information phone clustering for decision tree induction. In: ICSLP 2002. Proceedings of Int. Conf. Spoken Language Processing, Denver, Collorado (2002)
Google Scholar
Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using MLP features in lvcsr. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)
Google Scholar
Ikbal, S., Misra, H., Sivadas, S., Hermansky, H., Bourlard, H.: Entropy based combination of tandem representations for robust speech recognition. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)
Google Scholar
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: Hidden Markov model toolkit V3.2.1 reference manual. Technical report, Speech group, Engineering Department, Cambridge University, UK (2002)
Google Scholar
Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proceedings of Int. Conf. Spoken Language Processing, pp. 743–746 (1998)
Google Scholar
Stolcke, A., Grézl, F., Hwang, M.Y., Lei, X., Morgan, N., Vergyri, D.: Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons. In: ICASSP 2006. Proceedings of Int. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, P.O. Box 592, Martigny, CH-1920, Switzerland
John Dines & Mathew Magimai Doss
International Computer Science Institute, 1947 Center St, Berkeley, CA 94704,
Mathew Magimai Doss

Authors

John Dines
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Magimai Doss
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dines, J., Magimai Doss, M. (2008). A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-78155-4_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics