A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation

Martín-Iglesias, D.; Bernal-Chaves, J.; Peláez-Moreno, C.; Gallardo-Antolín, A.; Díaz-de-María, F.

doi:10.1007/11613107_22

D. Martín-Iglesias²³,
J. Bernal-Chaves²³,
C. Peláez-Moreno²³,
A. Gallardo-Antolín²³ &
…
F. Díaz-de-María²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Included in the following conference series:

International Conference on Nonlinear Analyses and Algorithms for Speech Processing

733 Accesses
4 Citations

Abstract

Automatic Speech Recognition (ASR) is essentially a problem of pattern classification, however, the time dimension of the speech signal has prevented to pose ASR as a simple static classification problem. Support Vector Machine (SVM) classifiers could provide an appropriate solution, since they are very well adapted to high-dimensional classification problems. Nevertheless, the use of SVMs for ASR is by no means straightforward, mainly because SVM classifiers require an input of fixed-dimension. In this paper we study the use of a HMM-based segmentation as a mean to get the fixed-dimension input vectors required by SVMs, in a problem of isolated-digit recognition. Different configurations for all the parameters involved have been tested. Also, we deal with the problem of multi-class classification (as SVMs are initially binary classifers), studying two of the most popular approaches: 1-vs-all and 1-vs-1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Article 11 October 2016

Segment-Level Probabilistic Sequence Kernel Based Support Vector Machines for Classification of Varying Length Patterns of Speech

Mining speech signal patterns for robust speaker variability classification

Article 14 September 2022

References

Sakoe, H., Isotani, R., Yoshida, K., Iso, K., Watanabe, T.: Speaker-independent word recognition using dynamic programming neural networks. In: Proc. ICASSP 1989, pp. 29–32 (1989)
Google Scholar
Iso, K., Watanabe, T.: Speaker-independent word recognition using a neural prediction model. In: Proc. ICASSP 1990, pp. 441–444 (1990)
Google Scholar
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous speech recognition using predictive neural networks. In: Proc. ICASSP-1991, pp. 61–64 (1991)
Google Scholar
Bengio, Y.: Neural networks for speech and sequence recognition. London International Thomson Computer Press (1995)
Google Scholar
Bourlard, H.A., Morgan, N.: Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar
Schölkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2001)
Google Scholar
Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)
MATH Google Scholar
Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 585–588 (1999)
Google Scholar
Ganapathiraju, A.: Support vector machines for speech recognition. PhD Thesis, Mississipi State Universisty (2002)
Google Scholar
Smith, N.D., Gales, M.J.F.: Using SVMs and discriminative models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2002)
Google Scholar
García-Cabellos, J.M., Peláez-Moreno, C., Gallardo-Antolín, A., Pérez-Cruz, F., Díaz-de-María, F.: SVM classifiers for ASR: A discussion about parameterization. In: Proceedings of EUSIPCO 2004, pp. 2067–2070 (2004)
Google Scholar
Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2000)
Article MathSciNet Google Scholar
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13 (2002)
Google Scholar
Huang, T.K., Weng, R.C., Lin, C.J.: A generalized bradley-terry model: From group competition to individual skill (2004). [on-line], http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/
Chih-Chung, Ch., Chih-Jen, L.: LIBSVM: a library for support vector machines. [on-line] (2004), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Young, S., et al.: HTK-Hidden Markov Model toolkit (ver 2.1). Cambridge University Press, Cambridge (1995)
Google Scholar
Varga, A.P., Steenneken, J.M., Tomlinson, M., Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Tech. Rep. DRA Speech Research Unit (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911, Leganés (Madrid), Spain
D. Martín-Iglesias, J. Bernal-Chaves, C. Peláez-Moreno, A. Gallardo-Antolín & F. Díaz-de-María

Authors

D. Martín-Iglesias
View author publications
You can also search for this author in PubMed Google Scholar
J. Bernal-Chaves
View author publications
You can also search for this author in PubMed Google Scholar
C. Peláez-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
A. Gallardo-Antolín
View author publications
You can also search for this author in PubMed Google Scholar
F. Díaz-de-María
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escola Universitària Politècnica de Mataró, UPC, Spain
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Spain
Léonard Janer & Antonio Satue-Villar &
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
Josep Roure
Escola Universitària Politècnica de Mataró (UPC), Barcelona, Spain
Virginia Espinosa-Duro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martín-Iglesias, D., Bernal-Chaves, J., Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F. (2006). A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_22

Download citation

DOI: https://doi.org/10.1007/11613107_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Segment-Level Probabilistic Sequence Kernel Based Support Vector Machines for Classification of Varying Length Patterns of Speech

Mining speech signal patterns for robust speaker variability classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Segment-Level Probabilistic Sequence Kernel Based Support Vector Machines for Classification of Varying Length Patterns of Speech

Mining speech signal patterns for robust speaker variability classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation