Entropy Based Feature Pooling in Speech Command Classification

Nalmpantis, Christoforos; Vrysis, Lazaros; Vlachava, Danai; Papageorgiou, Lefteris; Vrakas, Dimitris

doi:10.1007/978-3-030-80129-8_71

Christoforos Nalmpantis¹⁰,
Lazaros Vrysis¹⁰,
Danai Vlachava¹¹,
Lefteris Papageorgiou¹² &
…
Dimitris Vrakas¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 285))

1588 Accesses

Abstract

In this research a novel deep learning architecture is proposed for the problem of speech commands recognition. The problem is examined in the context of internet-of-things where most devices have limited resources in terms of computation and memory. The uniqueness of the architecture is that it uses a new feature pooling mechanism, named entropy pooling. In contrast to other pooling operations, which use arbitrary criteria for feature selection, it is based on the principle of maximum entropy. The designated deep neural network shows comparable performance with other state-of-the-art models, while it has less than half the size of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bountourakis, V., Vrysis, L., Konstantoudakis, K., Vryzas, N.: An enhanced temporal feature integration method for environmental sound recognition. In: Acoustics, vol. 1, pp. 410–422. Multidisciplinary Digital Publishing Institute (2019)
Google Scholar
Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010)
Google Scholar
Coucke, A., Chlieh, M., Gisselbrecht, T., Leroy, D., Poumeyrol, M., Lavril, T.: Efficient keyword spotting using dilated convolutions and gating. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6351–6355 (2019)
Google Scholar
Fayyad, J., Jaradat, M.A., Gruyer, D., Najjaran, H.: Deep learning sensor fusion for autonomous vehicle perception and localization: a review. Sensors 20(15), 4220 (2020)
Google Scholar
Han, W., et al.: Contextnet: improving convolutional neural networks for automatic speech recognition with global context. arXiv preprintarXiv:2005.03191 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kusupati, A., Singh, M., Bhatia, K., Kumar, A., Jain, P., Varma, M.: Fastgrnn: a fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. In: Advances in Neural Information Processing Systems, pp. 9017–9028 (2018)
Google Scholar
Lentzas, A., Vrakas, D.: Non-intrusive human activity recognition and abnormal behavior detection on elderly people: a review. Artif. Intell. Rev. 53, 1975–2021 (2020). https://doi.org/10.1007/s10462-019-09724-5
Article Google Scholar
McGraw, I., et al.: Personalized speech recognition on mobile devices. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5955–5959. IEEE (2016)
Google Scholar
Nalmpantis, C., Lentzas, A., Vrakas, D.: A theoretical analysis of pooling operation using information theory. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1729–1733. IEEE (2019)
Google Scholar
Nalmpantis, C., Vrakas, D.: On time series representations for multi-label NILM. Neural Comput. Appl. 32, 17275–17290 (2020). https://doi.org/10.1007/s00521-020-04916-5
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014)
Solovyev, R.A., et al.: Deep learning approaches for understanding simple speech commands. In: 2020 IEEE 40th International Conference on Electronics and Nanotechnology (ELNANO), pp. 688–693. IEEE (2020)
Google Scholar
Tsipas, N., Vrysis, L., Dimoulas, C., Papanikolaou, G.: Mirex 2015: Methods for speech/music detection and classification. In Processing, Music information retrieval evaluation eXchange (MIREX) (2015)
Google Scholar
Viswanathan, J., Saranya, N., Inbamani, A.: Deep learning applications in medical imaging: Introduction to deep learning-based intelligent systems for medical applications. In: Deep Learning Applications in Medical Imaging, pp. 156–177. IGI Global (2021)
Google Scholar
Vrysis, L., Thoidis, I., Dimoulas, C., Papanikolaou, G.: Experimenting with 1d CNN architectures for generic audio classification. In: Audio Engineering Society Convention 148. Audio Engineering Society (2020)
Google Scholar
Vrysis, L., Tsipas, N., Thoidis, I., Dimoulas, C.: 1d/2d deep cnns vs. temporal feature integration for general audio classification. J. Audio Eng. Soc. 68(1/2), 66–77 (2020)
Google Scholar
Warden, P.: Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprintarXiv:1804.03209 (2018)
Zeng, M., Xiao, N.: Effective combination of densenet and bilstm for keyword spotting. IEEE Access 7, 10767–10775 (2019)
Article Google Scholar
Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, Amr, E.D., Jin, W., Schuller, B.: Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans. Intell. Syst. Technol. 9(5), 28 p. (2018). https://doi.org/10.1145/3178115. Article 49

Download references

Acknowledgement

This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH–CREATE–INNOVATE (project code:T1EDK-00343(95699) - Energy Controlling Voice Enabled Intelligent Smart Home Ecosystem).

Author information

Authors and Affiliations

Aristotle University of Thessaloniki, Thessaloniki, Greece
Christoforos Nalmpantis, Lazaros Vrysis & Dimitris Vrakas
International Hellenic University, Thessaloniki, Greece
Danai Vlachava
Entranet Ltd, Thessaloniki, Greece
Lefteris Papageorgiou

Authors

Christoforos Nalmpantis
View author publications
You can also search for this author in PubMed Google Scholar
Lazaros Vrysis
View author publications
You can also search for this author in PubMed Google Scholar
Danai Vlachava
View author publications
You can also search for this author in PubMed Google Scholar
Lefteris Papageorgiou
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Vrakas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoforos Nalmpantis .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nalmpantis, C., Vrysis, L., Vlachava, D., Papageorgiou, L., Vrakas, D. (2021). Entropy Based Feature Pooling in Speech Command Classification. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 285. Springer, Cham. https://doi.org/10.1007/978-3-030-80129-8_71

Download citation

DOI: https://doi.org/10.1007/978-3-030-80129-8_71
Published: 06 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80128-1
Online ISBN: 978-3-030-80129-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics