Recognition of Multiple Language Voice Navigation Queries in Traffic Situations

Sárosi, Gellért; Mozsolics, Tamás; Tarján, Balázs; Balog, András; Mihajlik, Péter; Fegyó, Tibor

doi:10.1007/978-3-642-25775-9_20

Gellért Sárosi²¹,
Tamás Mozsolics^21,22,
Balázs Tarján²¹,
András Balog^21,22,
Péter Mihajlik^21,22 &
…
Tibor Fegyó^21,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

2524 Accesses
1 Citations

Abstract

This paper introduces our work and results related to a multiple language continuous speech recognition task. The aim was to design a system that introduces tolerable amount of recognition errors for point of interest words in voice navigational queries even in the presence of real-life traffic noise. Additional challenges were that no task-specific training databases were available for language and acoustic modeling. Instead, general purpose acoustic database were obtained and (probabilistic) context free grammars were constructed for the acoustic and language models, respectively. Public pronunciation lexicon was used for the English language, whereas rule- and exception dictionary based pronunciation modeling was applied for French, German, Italian, Spanish and Hungarian. For the last four languages the classical phoneme-based pronunciation modeling approach was compared to grapheme-based pronunciation modeling technique, as well. Noise robustness was addressed by applying various feature extraction methods. The results show that achieving high word recognition accuracy is feasible if cooperative speakers can be assumed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chelba, C., Schalkwyk, J., Brants, T., Ha, V., Harb, B., Neveitt, W., Parada, C., Xu, P.: Query Language Modeling for Voice Search. In: Proceedings of the 2010 IEEE Workshop on Spoken Language Technology (2010)
Google Scholar
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Garrett, M., Strope, B.: Google Search by Voice: A Case Study (2010)
Google Scholar
Yu, D., Ju, Y.-C., Wang, Y.-Y., Zweig, G., Acero, A.: Automated Directory Assistance System - from Theory to Practice. In: INTERSPEECH 2007, pp. 2709–2712 (2007)
Google Scholar
Lee, S.H., Chung, H., Park, J.G., Young, H.-Y., Lee, Y.: A Commercial Car Navigation System using Korean Large Vocabulary Automatic Speech Recognizer. In: APSIPA 2009 Annual Summit and Conference, pp. 286–289 (2009)
Google Scholar
Kim, D.-S., Lee, S.-Y., Rhee, M., Kil, R.M.: Auditory Processing of Speech Signals for Robust Speech Recognition in Real-World Noisy Environments. IEEE Transactions on Speech and Audio Processing 7(1), 55–69 (1999)
Article Google Scholar
Milner, B.: A comparison of front-end configurations for robust speech recognition. In: ICASSP 1993, pp. 797–800 (1993)
Google Scholar
European Language Resource Association, http://catalog.elra.info/
Hungarian Telephone Speech Database (Magyar Telefonos Beszéd Adatbázis), http://alpha.tmit.bme.hu/speech/hdbMTBA.php
Center for Spoken Language Research of Colorado: Phoenix parser for spontaneous speech, http://cslr.colorado.edu/~whw/phoenix/
Harris, T.K.: Bi-grams Generated from Phoenix Grammars and Sparse Data for the Universal Speech Interface. In: Language and Statistics Class Project, CMU (May 2002)
Google Scholar
CMU Language Compilation Suite for Dialog Systems, https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/logios/
A text phonetization system for the MBROLA system, http://tcts.fpms.ac.be/synthesis/mbrola/tts/French/liaphon.tar.gz
A German TTS-frontend for MBROLA system, http://www.sk.uni-bonn.de/forschung/phonetik/sprachsynthese/txt2pho
British English pronunciation dictionary, http://mi.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html
Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: ICASSP 1993, pp. 845–848 (1993)
Google Scholar
Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (for HTK version 3.4) (March 2009), http://htk.eng.cam.ac.uk
Mauuary, L.: Blind Equalization in the Cepstral Domain for robust Telephone based Speech Recognition. In: Proc. of EUSPICO 1998, vol. 1, pp. 59–363 (1998)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar
Szarvas, M.: Efficient Large Vocabulary Continuous Speech Recognition Using Weighted Finite-state Transducers – The Development of a Hungarian Dictation System. PhD Thesis, Department of Computer Science, Tokyo Institute of Technology, Tokyo (March 2003)
Google Scholar
CMU Speech Recognition Engine (SphinxTrain 1.0), http://www.speech.cs.cmu.edu/
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)
Article Google Scholar
Yapanel, U.H., Hansen, J.H.L.: A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. In: EUROSPEECH 2003, pp. 1281–1284 (2003)
Google Scholar
Kim, C., Stern, R.M.: Feature Extraction for Robust Speech Recognition using a Power-Law Nonlinearity and Power-Bias Subtraction. In: INTERSPEECH 2009, pp. 28–31 (2009)
Google Scholar
Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., Allerhand, M.H.: Complex sounds and auditory images. In: Cazals, Y., Demany, L., Horner, K. (eds.) Auditory and Perception, pp. 429–446. Pergamon Press, Oxford (1992)
Chapter Google Scholar
Sárosi, G., Mozsáry, M., Mihajlik, P., Fegyó, T.: Comparison of Feature Extraction Methods for Speech Recognition in Noise-Free and in Traffic Noise Environment. In: Proc. of the 6th Conference on Speech Technology and Human-Computer Dialogue, Romania, Brasov (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary
Gellért Sárosi, Tamás Mozsolics, Balázs Tarján, András Balog, Péter Mihajlik & Tibor Fegyó
THINKTech Research Center Nonprofit LLC., Hungary
Tamás Mozsolics, András Balog & Péter Mihajlik
Aitia International Inc., Hungary
Tibor Fegyó

Authors

Gellért Sárosi
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Mozsolics
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Tarján
View author publications
You can also search for this author in PubMed Google Scholar
András Balog
View author publications
You can also search for this author in PubMed Google Scholar
Péter Mihajlik
View author publications
You can also search for this author in PubMed Google Scholar
Tibor Fegyó
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology and IIASS, International Institute for Advanced Scientific Studies, Second University of Naples, Vietri sul Mare, SA, Italy
Anna Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Department of Telecommunication and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, 1117, Budapest, Hungary
Klára Vicsi
TELECOM ParisTech, CNRS-LTCI UMR 5141, 75014, Paris, France
Catherine Pelachaud
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sárosi, G., Mozsolics, T., Tarján, B., Balog, A., Mihajlik, P., Fegyó, T. (2011). Recognition of Multiple Language Voice Navigation Queries in Traffic Situations. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-25775-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics