skip to main content
10.1145/3163080.3163097acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicspsConference Proceedingsconference-collections
research-article

A Deep Autoencoder approach for Speaker Identification

Published: 27 November 2017 Publication History

Abstract

Speaker Identification (SID) is one of the most prominent and leading research arena that gained tremendous momentum in the recent years. This increased attention of machine learning approaches, especially deep neural networks (DNNs), further enhanced the interest towards SID research. Despite success stories of DNNs, there has been limited attention towards deep autoencoders (DAEs) and their applications has not been explored properly. In this paper, we attempt to address this gap by applying DAEs to identify speakers using analytical and experimental research prospects. The experiments were conducted using the data obtained from 84 speakers provided in AN4 corpus. To understand the significance of 'depth', i.e. the number of autoencoders in the DAE, multiple experiments with different number of autoencoder layers in the DAE were conducted. The experimental results show that DAE network with three autoencoders was able to achieve superior identification accuracy of 98.8% over the traditional neural networks. The findings of this study confirm the importance of 'depth' as highlighted in previous deep learning studies especially with the difference in accuracy between regular back propagation and layer-wise training. This paper further provides a new direction in the implementation of deep autoencoders for speaker identification.

References

[1]
Sandhya, M. and M.V. Prasad, Biometric Template Protection: A Systematic Literature Review of Approaches and Modalities, in Biometric Security and Privacy. 2017, Springer. p. 323--370.
[2]
Proctor Jr, J.A. and J.A. Proctor III, Smart hub. 2017, US Patent 9,554,061: The US.
[3]
Bafhtiar, G., et al. Providing Patient Home Clinical Decision Support using Off-the-shelf Cloud-based Smart Voice Recognition. in WIN Health Informatics Network Annual Conference. 2017. Coventry, The UK.
[4]
Drygajlo, A. and R. Haraksim, Biometric Evidence in Forensic Automatic Speaker Recognition, in Handbook of Biometrics for Forensic Science. 2017, Springer. p. 221--239.
[5]
Sen, N. and T. Basu. Features extracted using frequency-time analysis approach from Nyquist filter bank and Gaussian filter bank for text-independent speaker identification. in European Workshop on Biometrics and Identity Management. 2011. Brandenburg an der Havel, Germany: Springer.
[6]
Ghahabi, O., et al. Deep neural networks for i-vector language identification of short utterances in cars. in Interspeech 2016. 2016. San Francisco, The US.
[7]
Shahamiri, S.R. and S.S.B. Salim, A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2014. 22(5): p. 1053--1063.
[8]
Shahamiri, S.R. and S.S.B. Salim, Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach. Neurocomputing, 2014. 129: p. 199--207.
[9]
Ge, Z., et al., Neural Network Based Speaker Classification and Verification Systems with Enhanced Features. arXiv, 2017.
[10]
Wu, J.-D. and Y.-J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Systems with Applications, 2011. 38(5): p. 6112--6117.
[11]
Al-Ani, M.S., T.S. Mohammed, and K.M. Aljebory, Speaker identification: a hybrid approach using neural networks and wavelet transform. Journal of Computer Science, 2007. 3(4).
[12]
Campbell, W.M., D.E. Sturim, and D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 2006. 13(5): p. 308--311.
[13]
Deng, J., et al., Fisher kernels on phase-based features for speech emotion recognition, in Dialogues with Social Robots. 2017, Springer. p. 195--203.
[14]
Tirumala, S.S. and S.R. Shahamiri, A review on Deep Learning approaches in Speaker Identification, in Proceedings of the 8th International Conference on Signal Processing Systems. 2016, ACM: Auckland, New Zealand. p. 142--147.
[15]
McLaren, M., Y. Lei, and L. Ferrer. Advances in deep neural network approaches to speaker recognition. in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. Queensland, Australia: IEEE.
[16]
Qawaqneh, Z., A.A. Mallouh, and B.D. Barkana, Deep neural network framework and transformed MFCCs for speaker's age and gender classification. Knowledge-Based Systems, 2017. 115: p. 5--14.
[17]
Garcia-Romero, D., et al. Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. in 2014 IEEE Spoken Language Technology Workshop (SLT). 2014. IEEE.
[18]
Bengio, Y., Learning deep architectures for AI. Foundations and trends in Machine Learning, 2009. 2(1): p. 1--127.
[19]
Tirumala, S.S., S. Ali, and C.P. Ramesh. Evolving deep neural networks: A new prospect. in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). 2016. Changsha, China: IEEE.
[20]
Ghahabi, O. and J. Hernando. Deep belief networks for i-vector based speaker recognition. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.
[21]
Variani, E., et al. Deep neural networks for small footprint text-dependent speaker verification. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.
[22]
Gupta, V., et al. I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription. in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. Florence, Italy: IEEE.
[23]
Hinton, G., et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012. 29(6): p. 82--97.
[24]
Tirumala, S.S., et al., Speaker Identification Features Extraction Methods: A Systematic Review. Expert Systems with Applications, 2017. 90: p. 250--271.
[25]
Matejka, P., et al. Neural network bottleneck features for language identification. in The Speaker and Language Recognition Workshop (Odyssey 2014). 2014. Joensuu, Finland.
[26]
Richardson, F., D. Reynolds, and N. Dehak, A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923, 2015.
[27]
Acero, A., Acoustical and environmental robustness in automatic speech recognition. 1990, Carnegie Mellon University Pittsburgh.
[28]
Young, S., et al., The HTK book (for HTK version 3.4), in Cambridge university engineering department. 2006. p. 2--3.
[29]
Tirumala, S.S. Implementation of Evolutionary Algorithms for Deep Architectures. in AIC 2014 - 2nd International Workshop on Artificial Intelligence and Cognition. 2014. Torino, Italy.
[30]
Tirumala, S.S. and A. Narayanan, Hierarchical data classification using deep neural networks, in Neural Information Processing. 2015, Springer. p. 492--500.
[31]
LeCun, Y. and Y. Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 1995. 3361(10): p. 1995

Cited By

View all
  • (2024)Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited DatasetICST Transactions on Scalable Information Systems10.4108/eetsis.5697Online publication date: 9-Apr-2024
  • (2024)Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech SignalCircuits, Systems, and Signal Processing10.1007/s00034-023-02542-943:3(1839-1861)Online publication date: 1-Mar-2024
  • (2024)Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy BackgroundCognitive Computing and Cyber Physical Systems10.1007/978-3-031-48888-7_27(318-330)Online publication date: 5-Jan-2024
  • Show More Cited By

Index Terms

  1. A Deep Autoencoder approach for Speaker Identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICSPS 2017: Proceedings of the 9th International Conference on Signal Processing Systems
    November 2017
    237 pages
    ISBN:9781450353847
    DOI:10.1145/3163080
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep learning
    2. Deep neural networks
    3. Speaker identification
    4. Stacked autoencoders

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSPS 2017

    Acceptance Rates

    Overall Acceptance Rate 46 of 83 submissions, 55%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited DatasetICST Transactions on Scalable Information Systems10.4108/eetsis.5697Online publication date: 9-Apr-2024
    • (2024)Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech SignalCircuits, Systems, and Signal Processing10.1007/s00034-023-02542-943:3(1839-1861)Online publication date: 1-Mar-2024
    • (2024)Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy BackgroundCognitive Computing and Cyber Physical Systems10.1007/978-3-031-48888-7_27(318-330)Online publication date: 5-Jan-2024
    • (2023)Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition SystemIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.330702031(3407-3416)Online publication date: 2023
    • (2023)Dysarthric Speech Recognition using Depthwise Separable Convolutions: Preliminary Study2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)10.1109/SpeD59241.2023.10314894(78-82)Online publication date: 25-Oct-2023
    • (2023)Dysarthric Speech Recognition: A Comparative Study2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)10.1109/SpeD59241.2023.10314881(89-94)Online publication date: 25-Oct-2023
    • (2023)Autism Artificial Intelligence Performance Analysis: Five Years of Operation2023 IEEE International Conference on Advanced Learning Technologies (ICALT)10.1109/ICALT58122.2023.00029(79-83)Online publication date: Jul-2023
    • (2023)An optimized enhanced-multi learner approach towards speaker identification based on single-sound segmentsMultimedia Tools and Applications10.1007/s11042-023-16507-283:8(24541-24562)Online publication date: 17-Aug-2023
    • (2022)Assessing Machine Learning Approaches for Imputing Missing Values in Cardiovascular Dataset2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9825194(1-6)Online publication date: 7-Apr-2022
    • (2022)An optimum end-to-end text-independent speaker identification system using convolutional neural networkComputers and Electrical Engineering10.1016/j.compeleceng.2022.107882100(107882)Online publication date: May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media