research-article

A Deep Autoencoder approach for Speaker Identification

Authors:

Sreenivas Sremath Tirumala,

Seyed Reza ShahamiriAuthors Info & Claims

ICSPS 2017: Proceedings of the 9th International Conference on Signal Processing Systems

Pages 175 - 179

https://doi.org/10.1145/3163080.3163097

Published: 27 November 2017 Publication History

Abstract

Speaker Identification (SID) is one of the most prominent and leading research arena that gained tremendous momentum in the recent years. This increased attention of machine learning approaches, especially deep neural networks (DNNs), further enhanced the interest towards SID research. Despite success stories of DNNs, there has been limited attention towards deep autoencoders (DAEs) and their applications has not been explored properly. In this paper, we attempt to address this gap by applying DAEs to identify speakers using analytical and experimental research prospects. The experiments were conducted using the data obtained from 84 speakers provided in AN4 corpus. To understand the significance of 'depth', i.e. the number of autoencoders in the DAE, multiple experiments with different number of autoencoder layers in the DAE were conducted. The experimental results show that DAE network with three autoencoders was able to achieve superior identification accuracy of 98.8% over the traditional neural networks. The findings of this study confirm the importance of 'depth' as highlighted in previous deep learning studies especially with the difference in accuracy between regular back propagation and layer-wise training. This paper further provides a new direction in the implementation of deep autoencoders for speaker identification.

References

[1]

Sandhya, M. and M.V. Prasad, Biometric Template Protection: A Systematic Literature Review of Approaches and Modalities, in Biometric Security and Privacy. 2017, Springer. p. 323--370.

[2]

Proctor Jr, J.A. and J.A. Proctor III, Smart hub. 2017, US Patent 9,554,061: The US.

[3]

Bafhtiar, G., et al. Providing Patient Home Clinical Decision Support using Off-the-shelf Cloud-based Smart Voice Recognition. in WIN Health Informatics Network Annual Conference. 2017. Coventry, The UK.

[4]

Drygajlo, A. and R. Haraksim, Biometric Evidence in Forensic Automatic Speaker Recognition, in Handbook of Biometrics for Forensic Science. 2017, Springer. p. 221--239.

[5]

Sen, N. and T. Basu. Features extracted using frequency-time analysis approach from Nyquist filter bank and Gaussian filter bank for text-independent speaker identification. in European Workshop on Biometrics and Identity Management. 2011. Brandenburg an der Havel, Germany: Springer.

Digital Library

[6]

Ghahabi, O., et al. Deep neural networks for i-vector language identification of short utterances in cars. in Interspeech 2016. 2016. San Francisco, The US.

[7]

Shahamiri, S.R. and S.S.B. Salim, A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2014. 22(5): p. 1053--1063.

[8]

Shahamiri, S.R. and S.S.B. Salim, Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach. Neurocomputing, 2014. 129: p. 199--207.

Digital Library

[9]

Ge, Z., et al., Neural Network Based Speaker Classification and Verification Systems with Enhanced Features. arXiv, 2017.

[10]

Wu, J.-D. and Y.-J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Systems with Applications, 2011. 38(5): p. 6112--6117.

Digital Library

[11]

Al-Ani, M.S., T.S. Mohammed, and K.M. Aljebory, Speaker identification: a hybrid approach using neural networks and wavelet transform. Journal of Computer Science, 2007. 3(4).

[12]

Campbell, W.M., D.E. Sturim, and D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 2006. 13(5): p. 308--311.

[13]

Deng, J., et al., Fisher kernels on phase-based features for speech emotion recognition, in Dialogues with Social Robots. 2017, Springer. p. 195--203.

[14]

Tirumala, S.S. and S.R. Shahamiri, A review on Deep Learning approaches in Speaker Identification, in Proceedings of the 8th International Conference on Signal Processing Systems. 2016, ACM: Auckland, New Zealand. p. 142--147.

Digital Library

[15]

McLaren, M., Y. Lei, and L. Ferrer. Advances in deep neural network approaches to speaker recognition. in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. Queensland, Australia: IEEE.

[16]

Qawaqneh, Z., A.A. Mallouh, and B.D. Barkana, Deep neural network framework and transformed MFCCs for speaker's age and gender classification. Knowledge-Based Systems, 2017. 115: p. 5--14.

[17]

Garcia-Romero, D., et al. Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. in 2014 IEEE Spoken Language Technology Workshop (SLT). 2014. IEEE.

[18]

Bengio, Y., Learning deep architectures for AI. Foundations and trends in Machine Learning, 2009. 2(1): p. 1--127.

Digital Library

[19]

Tirumala, S.S., S. Ali, and C.P. Ramesh. Evolving deep neural networks: A new prospect. in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). 2016. Changsha, China: IEEE.

[20]

Ghahabi, O. and J. Hernando. Deep belief networks for i-vector based speaker recognition. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.

[21]

Variani, E., et al. Deep neural networks for small footprint text-dependent speaker verification. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.

[22]

Gupta, V., et al. I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.

[23]

Hinton, G., et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012. 29(6): p. 82--97.

[24]

Tirumala, S.S., et al., Speaker Identification Features Extraction Methods: A Systematic Review. Expert Systems with Applications, 2017. 90: p. 250--271.

Digital Library

[25]

Matejka, P., et al. Neural network bottleneck features for language identification. in The Speaker and Language Recognition Workshop (Odyssey 2014). 2014. Joensuu, Finland.

[26]

Richardson, F., D. Reynolds, and N. Dehak, A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923, 2015.

[27]

Acero, A., Acoustical and environmental robustness in automatic speech recognition. 1990, Carnegie Mellon University Pittsburgh.

Digital Library

[28]

Young, S., et al., The HTK book (for HTK version 3.4), in Cambridge university engineering department. 2006. p. 2--3.

[29]

Tirumala, S.S. Implementation of Evolutionary Algorithms for Deep Architectures. in AIC 2014 - 2nd International Workshop on Artificial Intelligence and Cognition. 2014. Torino, Italy.

[30]

Tirumala, S.S. and A. Narayanan, Hierarchical data classification using deep neural networks, in Neural Information Processing. 2015, Springer. p. 492--500.

[31]

LeCun, Y. and Y. Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995. 3361(10): p. 1995

Digital Library

Cited By

Niwatkar AKanse YKushwaha A(2024)Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited DatasetICST Transactions on Scalable Information Systems10.4108/eetsis.5697Online publication date: 9-Apr-2024
https://doi.org/10.4108/eetsis.5697
Saritha BLaskar MKirupakaran ALaskar RChoudhury MShome N(2024)Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech SignalCircuits, Systems, and Signal Processing10.1007/s00034-023-02542-943:3(1839-1861)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00034-023-02542-9
Niwatkar AKanse YKushwaha A(2024)Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy BackgroundCognitive Computing and Cyber Physical Systems10.1007/978-3-031-48888-7_27(318-330)Online publication date: 5-Jan-2024
https://doi.org/10.1007/978-3-031-48888-7_27
Show More Cited By

Index Terms

A Deep Autoencoder approach for Speaker Identification
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Research of stacked denoising sparse autoencoder

Learning results depend on the representation of data, so how to efficiently represent data has been a research hot spot in machine learning and artificial intelligence. With the deepening of the deep learning research, studying how to train the deep ...
The use of wavelets in speaker feature tracking identification system using neural network

Continuous and Discrete Wavelet Transform (WT) are used to create text-dependent robust to noise speaker recognition system. In this paper we investigate the accuracy of identification the speaker identity in non- stationary signals. Three methods are ...
Deep Maxout Networks Applied to Noise-Robust Speech Recognition
IberSPEECH 2014: Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854

Deep Neural Networks DNN have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models GMM. However, not many works have addressed the robustness of these systems under noisy conditions. Recently, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICSPS 2017: Proceedings of the 9th International Conference on Signal Processing Systems

November 2017

237 pages

ISBN:9781450353847

DOI:10.1145/3163080

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSPS 2017

ICSPS 2017: The 9th International Conference on Signal Processing Systems

November 27 - 30, 2017

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 46 of 83 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Niwatkar AKanse YKushwaha A(2024)Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited DatasetICST Transactions on Scalable Information Systems10.4108/eetsis.5697Online publication date: 9-Apr-2024
https://doi.org/10.4108/eetsis.5697
Saritha BLaskar MKirupakaran ALaskar RChoudhury MShome N(2024)Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech SignalCircuits, Systems, and Signal Processing10.1007/s00034-023-02542-943:3(1839-1861)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00034-023-02542-9
Niwatkar AKanse YKushwaha A(2024)Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy BackgroundCognitive Computing and Cyber Physical Systems10.1007/978-3-031-48888-7_27(318-330)Online publication date: 5-Jan-2024
https://doi.org/10.1007/978-3-031-48888-7_27
Shahamiri SLal VShah D(2023)Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition SystemIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.330702031(3407-3416)Online publication date: 2023
https://doi.org/10.1109/TNSRE.2023.3307020
Shahamiri SMandal KSarkar S(2023)Dysarthric Speech Recognition using Depthwise Separable Convolutions: Preliminary Study2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)10.1109/SpeD59241.2023.10314894(78-82)Online publication date: 25-Oct-2023
https://doi.org/10.1109/SpeD59241.2023.10314894
Shah DLal VZhong ZWang QShahamiri S(2023)Dysarthric Speech Recognition: A Comparative Study2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)10.1109/SpeD59241.2023.10314881(89-94)Online publication date: 25-Oct-2023
https://doi.org/10.1109/SpeD59241.2023.10314881
Shahamiri S(2023)Autism Artificial Intelligence Performance Analysis: Five Years of Operation2023 IEEE International Conference on Advanced Learning Technologies (ICALT)10.1109/ICALT58122.2023.00029(79-83)Online publication date: Jul-2023
https://doi.org/10.1109/ICALT58122.2023.00029
Shahamiri S(2023)An optimized enhanced-multi learner approach towards speaker identification based on single-sound segmentsMultimedia Tools and Applications10.1007/s11042-023-16507-283:8(24541-24562)Online publication date: 17-Aug-2023
https://doi.org/10.1007/s11042-023-16507-2
Kaur PNisa ZTirumala S(2022)Assessing Machine Learning Approaches for Imputing Missing Values in Cardiovascular Dataset2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9825194(1-6)Online publication date: 7-Apr-2022
https://doi.org/10.1109/I2CT54291.2022.9825194
Farsiani SIzadkhah HLotfi S(2022)An optimum end-to-end text-independent speaker identification system using convolutional neural networkComputers and Electrical Engineering10.1016/j.compeleceng.2022.107882100(107882)Online publication date: May-2022
https://doi.org/10.1016/j.compeleceng.2022.107882
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten