Vietnamese Automatic Speech Recognition: The FLaVoR Approach

Vu, Quan; Demuynck, Kris; Van Compernolle, Dirk

doi:10.1007/11939993_49

Quan Vu²²,
Kris Demuynck²² &
Dirk Van Compernolle²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1582 Accesses
9 Citations

Abstract

Automatic speech recognition for languages in Southeast Asia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the first layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of finite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both effective and flexible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Demuynck, K., Laureys, T., Van Compernolle, D., Van Hamme, H.: FLaVoR: a Flexible Architecture for LVCSR. In: Eurospeech 2003, Geneva, Switzerland, pp. 1973–1976 (2003)
Google Scholar
Nguyen, H., Vu, Q.: Selection of Basic Units for Vietnamese Large Vocabulary Continuous Speech Recognition. In: The 4th IEEE International Conference on Computer Science - Research, Innovation and Vision of the Future, HoChiMinh City, Vietnam, pp. 320–326 (2006)
Google Scholar
Zhang, J.Y., et al.: Improved Context-Dependent Acoustic Modeling for Continuous Chinese Speech Recognition. In: Eurospeech 2001, Aalborg, Denmark, pp. 1617–1620 (2001)
Google Scholar
Suebvisai, S., et al.: Thai Automatic Speech Recognition. In: ICASSP 2005, Philadelphia, PA, USA, pp. 857–860 (2005)
Google Scholar
Liu, Y., Fung, P.: Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language 17, 357–379 (2003)
Article Google Scholar
Xiang, B., et al.: The BBN Mandarin Broadcast News Transcription System. In: InterSpeech 2005, Lisbon, Portugal, pp. 1649–1652 (2005)
Google Scholar
Mohri, M., Pereira, F.C.N., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar
Kanthak, S., Ney, H.: FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation. In: ACL 2004, Barcelona, Spain, pp. 510–517 (2004)
Google Scholar
Hetherington, L.: MIT Finite State Transducer Toolkit (2005), http://people.csail.mit.edu/ilh//fst/
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, pp. 901–904 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

K.U.Leuven/ESAT/PSI, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
Quan Vu, Kris Demuynck & Dirk Van Compernolle

Authors

Quan Vu
View author publications
You can also search for this author in PubMed Google Scholar
Kris Demuynck
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Van Compernolle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, Q., Demuynck, K., Van Compernolle, D. (2006). Vietnamese Automatic Speech Recognition: The FLaVoR Approach. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_49

Download citation

DOI: https://doi.org/10.1007/11939993_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics