Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech

Gupta, Shikha; Karanath, Ahmed; Mahrifa, Kansul; Dileep, A. D.; Thenkanidiyoor, Veena

doi:10.1007/s10772-018-09587-1

Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech

Published: 05 February 2019

Volume 22, pages 231–249, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Shikha Gupta ORCID: orcid.org/0000-0002-0574-2543¹,
Ahmed Karanath¹,
Kansul Mahrifa¹,
A. D. Dileep¹ &
…
Veena Thenkanidiyoor²

202 Accesses
2 Citations
Explore all metrics

Abstract

In this work, we address some issues in the classification of varying length patterns of speech represented as sets of continuous-valued feature vectors using kernel methods. Kernels designed for varying length patterns are called as dynamic kernels. We propose two dynamic kernels namely segment-level pyramid match kernel (SLPMK) and segment-level probabilistic sequence kernel (SLPSK) for classification of long duration speech, represented as varying length sets of feature vectors using extreme learning machine (ELM). SLPMK and SLPSK are designed by partitioning the speech signal into increasingly finer segments and matching the corresponding segments. SLPSK is built upon a set of Gaussian basis functions, where half of the basis functions contain class-specific information while the other half implicates the common characteristics of all the speech utterances of all classes. The computational complexity of SVM training algorithms is usually intensive, which is at least quadratic with respect to the number of training examples. It is difficult to deal with the immense amount of data using traditional SVMs. For reducing the training time of classifier we propose to use a simple algorithm namely ELM. ELM refers to a wider type of generalized single hidden layer feedforward networks (SLFNs) whose hidden layer need not be tuned. In our work, we proposed to explore kernel based ELM to exploit dynamic kernels. We study the performance of the ELM-based classifiers using the proposed SLPSK and SLPMK for speech emotion recognition and speaker identification tasks and compare with other kernels for varying length patterns. Experimental studies showed that proposed ELM-based approach offer a 10–12% of relative improvement over baseline approach, and a 3–9% relative improvement over ELMs/SVMs using other state-of-the-art dynamic kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

Alejandro Pasos Ruiz, Michael Flynn, … Anthony Bagnall

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Mohammed Jawad Al-Dujaili & Abbas Ebrahimi-Moghadam

References

Alexandos, I., Tefas, A., & Pitas, ioannis. (2015). On the kernel extreme learning machine classifiers. Pattern Recognition Letters, 54, 11–17.
Article Google Scholar
Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1(Dec), 113–141.
MathSciNet MATH Google Scholar
Boughorbel, S., Tarel, J. P., & Boujemaa, N. (2005). The intermediate matching kernel for image local features. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2005) (pp. 889–894), Montreal
Burkhardt, F., Paeschke, A., Rolfes, M., & Weiss, W. S. B. (2005). A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520), Lisbon.
Campbell, W. M., & Sturim, D. D. E. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.
Article Google Scholar
Chang, C. C., & Linm, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27. http://www.csie.ntu.edu.tw/cjlin/libsvm.
Article Google Scholar
Chen, Yh., Lopez-Moreno, I., Sainath, T., Visontai, M., Alvarez, R., & Parada, C. (2015). Locally connected and convolutional neural networks for small footprint speaker recognition. In Proceedings of INTERSPEECH (pp. 1136–1140), Dresden.
Chorowski, J., Wang, J., & Zurada, J. M. (2014). Review and performance comparison of svm-and elm-based classifiers. Neurocomputing, 128, 507–516.
Article Google Scholar
Dileep, A. D., & Chandra Sekhar, C. (2012). Speaker recognition using pyramid match kernel based support vector machines. Internatiional Journal for Speech Technology, 15(3), 365–379.
Article Google Scholar
Dileep, A. D., & Chandra Sekhar, C. (2014). GMM-based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 25(8), 1421–1432.
Article Google Scholar
Gemert, Veenman C. J., Smeulders, A. W. M., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(17), 1271–1283.
Article Google Scholar
Gordon, G., & Tibshirani, R. (2012). Karush-kuhn-tucker conditions. Optimization, 10(725/36), 725.
Google Scholar
Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. The Journal of Machine Learning Research, 8, 725–760.
MATH Google Scholar
Gupta, S., Dileep, A. D., & Thenkanidiyoor, V. (2016a). Segment-level pyramid match kernels for the classification of varying length patterns of speech using svms. In Signal Processing Conference (EUSIPCO), 2016 24th European, IEEE (pp. 2030–2034).
Gupta, S., Thenkanidiyoor, V., & Dileep, A. D. (2016b). Segment-level probabilistic sequence kernel based support vector machines for classification of varying length patterns of speech. In International Conference on Neural Information Processing (pp. 321–328). New York: Springer.
Huang, G. (2014). An insight into extreme learning machines: Random neurons, random features and kernels. Cognitive Computation, 6(3), 376–390. https://doi.org/10.1007/s12559-014-9255-2.
Article Google Scholar
Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892.
Article Google Scholar
Huang, G. B., Zhou, H., Ding, X., et al. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, B (Cybernetics), 42(2), 513–529.
Article Google Scholar
Lee, K. A., HTK You, C. H. (2007). A GMM-based probabilistic sequence kernel for speaker verification. In Proceedings of INTERSPEECH, (pp. 294–297), Antwerp.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), (vol. 2, pp. 2169–2178), New York.
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213.
Article Google Scholar
Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17(8), 857–872.
Article Google Scholar
Rabiner, L., & Juang, B. H. (2003). Fundamentals of Speech Recognition. Pearson Education.
Rao, C. R., & Mitra, S. K. (1971). Generalized inverse of matrices and its applications (Vol. 7). New York: Wiley.
MATH Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Article Google Scholar
Sachdev, A., Dileep, A. D., & Thenkanidiyoor, V. (2015). Example-specific density based matching kernel for classification of varying length patterns of speech using support vector machines. In Proceedings of ICONIP, (pp. 177–184). Istanbul.
Smith, N., Gales, M., & Niranjan, M. (2001). Data-dependent kernels in SVM classification of speech patterns. Tech. Rep. CUED/F-INFENG/TR.387, Cambridge University Engineering Department, Cambridge.
Steidl, S. (2009). Automatic classification of emotion-related user states in spontaneous childern’s speech. PhD thesis, Der Technischen Fakultät der Universität Erlangen-Nürnberg, Germany.
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11–32.
Article Google Scholar
The NIST Year 2002 Speaker Recognition Evaluation Plan. (2002). http://www.itlnistgov/iad/mig/tests/spk/2002/
The NIST Year 2003 Speaker Recognition Evaluation Plan. (2003). http://www.itlnistgov/iad/mig/tests/sre/2003/
Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 3539–3546).
Wang J., KYFLTH Yang, J., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of CVPR’10, IEEE (pp. 3360–3367). State College: The Pennsylvania State University.
Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of CVPR’09, IEEE, (pp. 1794–1801).
You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.
Article Google Scholar
Zhang, L., Zhang, D., & Tian, F. (2016). Svm and elm: Who wins? object recognition with deep convolutional features from imagenet. In Proceedings of ELM-2015 (Vol. 1, pp. 249–263). Springer: New York.

Download references

Author information

Authors and Affiliations

School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, H.P., 175001, India
Shikha Gupta, Ahmed Karanath, Kansul Mahrifa & A. D. Dileep
Department of Computer Science and Engineering, National Institute of Technology Goa, Ponda, Goa, 403401, India
Veena Thenkanidiyoor

Authors

Shikha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Karanath
View author publications
You can also search for this author in PubMed Google Scholar
Kansul Mahrifa
View author publications
You can also search for this author in PubMed Google Scholar
A. D. Dileep
View author publications
You can also search for this author in PubMed Google Scholar
Veena Thenkanidiyoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikha Gupta.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, S., Karanath, A., Mahrifa, K. et al. Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech. Int J Speech Technol 22, 231–249 (2019). https://doi.org/10.1007/s10772-018-09587-1

Download citation

Received: 10 August 2018
Accepted: 22 December 2018
Published: 05 February 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09587-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech

Abstract

Access this article

Similar content being viewed by others

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech

Abstract

Access this article

Similar content being viewed by others

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation