How to Add Word Classes to the Kaldi Speech Recognition Toolkit

Horndasch, Axel; Kaufhold, Caroline; Nöth, Elmar

doi:10.1007/978-3-319-45510-5_56

Axel Horndasch¹⁷,
Caroline Kaufhold¹⁷ &
Elmar Nöth¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1803 Accesses
4 Citations

Abstract

The paper explains and illustrates how the concept of word classes can be added to the widely used open-source speech recognition toolkit Kaldi. The suggested extensions to existing Kaldi recipes are limited to the word-level grammar (G) and the pronunciation lexicon (L) models. The implementation to modify the weighted finite state transducers employed in Kaldi makes use of the OpenFST library. In experiments on small and mid-sized corpora with vocabulary sizes of 1.5 K and 5.5 K respectively a slight improvement of the word error rate is observed when the approach is tested with (hand-crafted) word classes. Furthermore it is shown that the introduction of sub-word unit models for open word classes can help to robustly detect and classify out-of-vocabulary words without impairing word recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See for example https://sourceforge.net/p/kaldi/discussion/1355348/thread/c7c5e4f6/.
2.
To save resources it was decided during the design phase of Evar that the system should only be able to provide information about express trains (so called IC/ICE trains). As a consequence only city names with an express train station were included in the vocabulary of the recognizer. While this may seem intuitive at first, it lead to a large number of OOVs because even cooperative users were not always sure in which cities there were express train stops.

References

Ney, H., Essen, U.: On smoothing techniques for bigram-based natural language modelling. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 1991, Toronto, Canada (1991)
Google Scholar
Brown, P.F., et al.: Class-based \(N\)-gram models of natural language. In: Computational Linguistics, vol. 18, pp. 467–479. MIT Press, Cambridge (1992)
Google Scholar
Kneser, R., Ney, H.: Improved clustering techniques for class-based statistical language modelling. In: Proceedings of EUROSPEECH 1993, Berlin, Germany (1993)
Google Scholar
Bazzi, I., Glass, J.R.: A multi-class approach for modelling out-of-vocabulary words. In: Proceedings of INTERSPEECH 2002, Denver, Colorado, USA (2002)
Google Scholar
Schaaf, T.: Detection of OOV words using generalized word models and a semantic class language model. In: Proceedings of INTERSPEECH 2001, Aalborg, Denmark, pp. 2581–2584 (2001)
Google Scholar
Gallwitz, F.: Integrated stochastic models for spontaneous speech recognition. Ph.D. thesis, Pattern Recognition Lab, Computer Science Department 5, University of Erlangen-Nuremberg, Logos Verlag, Berlin, Germany (2002)
Google Scholar
Seneff, S., Wang, C., Hetherington, I.L., Chung, G.: A dynamic vocabulary spoken dialogue interface. In: Proceedings of INTERSPEECH 2004, Jeju Island, Korea (2004)
Google Scholar
Aleksic, P.S., Allauzen, C., Elson, D., Kracun, A., Casado, D.M., Moreno, P.J.: Improved recognition of contact names in voice commands. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, pp. 5172–5175 (2015)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16(1), 69–88 (2002). Elsevier
Article Google Scholar
Allauzen, C., Riley, M.D., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: a general and efficient weighted finite-state transducer library. In: Holub, J., Žd’árek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007)
Chapter Google Scholar
Schukat-Talamazzini, E.G.: Automatische Spracherkennung - Grundlagen, statistische Modelle und effiziente Algorithmen. Vieweg, Braunschweig (1995)
Google Scholar
Eckert, W., Kuhn, T., Niemann, H., Rieck, S., Scheuer, A., Schukat-Talamazzini, E.G.: A spoken dialogue system for German intercity train timetable inquiries. In: Proceedings of EUROSPEECH 1993, Berlin, Germany, pp. 1871–1874 (1993)
Google Scholar
Stemmer, G.: Modeling variability in speech recognition. Ph.D. thesis, Pattern Recognition Lab, Computer Science Department 5, University of Erlangen-Nuremberg, Logos Verlag, Berlin, Germany (2005)
Google Scholar
Wahlster, W.: SmartWeb: Mobile applications of the Semantic Web. GI Jahrestagung 1, 26–27 (2004)
Google Scholar
Mögele, H., Kaiser, M., Schiel, F.: SmartWeb UMTS speech data collection: the SmartWeb handheld corpus. In: Procedings of LREC 2006, Genova, Italy, pp. 2106–2111 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Informatik 5 (Mustererkennung), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 3, 91058, Erlangen, Germany
Axel Horndasch, Caroline Kaufhold & Elmar Nöth

Authors

Axel Horndasch
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Kaufhold
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Axel Horndasch .

Editor information

Editors and Affiliations

Masaryk University , Brno, Czech Republic
Petr Sojka
Masaryk University , Brno, Czech Republic
Aleš Horák
Masaryk University , Brno, Czech Republic
Ivan Kopeček
Masaryk University , Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Horndasch, A., Kaufhold, C., Nöth, E. (2016). How to Add Word Classes to the Kaldi Speech Recognition Toolkit. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-319-45510-5_56
Published: 03 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics