Abstract
Speech and sound recognition in home automation scenarios has been gaining an increasing interest in the last decade. One interesting approach addressed in the literature is based on the template matching paradigm, which is characterized by ease of implementation and independence on large datasets for system training. Moving from a recent contribution of some of the authors, where an Extreme Learning Machine algorithm was proposed and evaluated, a wider performance analysis in diverse operating conditions is provided here, together with some relevant improvements. These are allowed by the employment of supervector features as input, for the first time used with ELMs, up to the authors’ knowledge. As already verified in other application contexts and with different learning systems, this ensures a more robust characterization of the speech segment to be classified, also in presence of mismatch between training and testing data. The accomplished computer simulations confirm the effectiveness of the approach, with F\(_1\)-Measure performance up to 99 % in the multicondition case, and a computational time reduction factor close to 4, with respect to the SVM counterpart.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angelini, B., Brugnara, F., Falavigna, D., Giuliani, D., Gretter, R., Omologo, M.: Automatic segmentation and labeling of english and italian speech databases. In: Proceedings of Eurospeech, pp. 653–656. Berlin, Germany, 22–25 Sept 1993
Anguera, X.: Information retrieval-based dynamic time warping. In: Proceedings of Interspeech, pp. 1–5. Lyon, France, 25–29 Aug 2013
Chorowski, J., Wang, J., Zurada, J.M.: Review and performance comparison of SVM-and ELM-based classifiers. Neurocomputing 128, 507–516 (2014)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dileep, A.D., Sekhar, C.C.: Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines. Speech Commun. 57, 126–143 (2014)
Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. In: Proceedings of ICSLP, pp. 504–507. Beijing, China, 16–20 Oct 2000
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Process. Mag., IEEE 29(6), 82–97 (2012)
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst., Man, Cybern. B 42(2), 513–529 (2012)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. Tech. Rep. 148, German National Research Center for Information Technology, Bonn, Germany (2001)
Kim, C., Seo, K.D.: Robust DTW-based recognition algorithm for hand-held consumer devices. IEEE Trans. Consum. Electron. 51(2), 699–709 (2005)
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)
Principi, E., Squartini, S., Cambria, E., Piazza, F.: Acoustic template-matching for automatic emergency state detection: an ELM based algorithm. Neurocomputing 149, 426–434 (2014)
Principi, E., Squartini, S., Piazza, F., Fuselli, D., Bonifazi, M.: A distributed system for recognizing home automation commands and distress calls in the Italian language. In: Proceedings of Interspeech, pp. 2049–2053. Lyon, France, 25–29 Aug 2013
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall PTR (1993)
Saon, G., Chien, J.T.: Large-vocabulary continuous speech recognition systems: a look at some recent advances. IEEE Signal Process. Mag. 29(6), 18–33 (2012)
Zhang, X., Sun, J., Luo, Z., Li, M.: Confidence Index Dynamic Time Warping for Language-Independent Embedded Speech Recognition. In: Proceedings of ICASSP, pp. 8066–8070. Vancouver, Canada, 26–31 May 2013
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
della Porta, G., Principi, E., Ferroni, G., Squartini, S., Hussain, A., Piazza, F. (2016). ELM Based Algorithms for Acoustic Template Matching in Home Automation Scenarios: Advancements and Performance Analysis. In: Esposito, A., et al. Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-28109-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28107-0
Online ISBN: 978-3-319-28109-4
eBook Packages: EngineeringEngineering (R0)