skip to main content
10.1145/3015166.3015210acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicspsConference Proceedingsconference-collections
research-article

A review on Deep Learning approaches in Speaker Identification

Published: 21 November 2016 Publication History

Abstract

Deep learning (DL) is becoming an increasingly interesting and powerful machine learning method with successful applications in many domains, such as natural language processing, image recognition, hand-written character recognition, and computer vision. Despite of its eminent success, limitations of traditional learning approach may still prevent deep learning from achieving a wide range of realistic learning tasks. DL approaches has shown success in speech recognition and speaker identification over traditional approaches such as those that use Mel Frequency Cepstrum Coefficients for feature extraction with Gaussian Mixture Models. However, speaker identification research community are not fully aware of the DL process and its application with respect to speaker identification. This paper is motivated to reduce this knowledge gap and to promote the research of implementing deep learning techniques for speaker identification. In this paper, we present a review of the DL methodologies used for speaker identification and surveys important DL algorithms that can potentially be explored for future works. We categorised the applications of DL for speaker identification according to the process of speaker identification and presented a review of these implementations.

References

[1]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278--2324, 1998.
[2]
S. S. Tiruala. Deep learning: Fundamentals, methods and applications. In J. Porter, editor, DEEPLEARNING USING UNCONVENTIONALPARADIGMS, chapter 1, pages 11--. NOVA publishes, New York, 2014.
[3]
R. Rajesh, K. Ganesh, S. C. L. Koh, N. Singh, R. Khan, and R. Shree. International conference on modelling optimization and computing applications of speaker recognition. Procedia Engineering, 38:3122--3126, 2012.
[4]
S. R. Shahamiri and S. S Binti Salim. Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach. Neurocomputing, 129:199--207, 2014
[5]
H. Kekre, A. Athawale, and M. Desai. Speaker identification using row mean vector of spectrogram. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pages 171--174. ACM, 2011.
[6]
F. Richardson, D. Reynolds, and N. Dehak. A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923, 2015.
[7]
M. McLaren, Y. Lei, and L. Ferrer. Advances in deep neural network approaches to speaker recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4814--4818. IEEE, 2015.
[8]
O. Ghahabi and J. Hernando. Deep learning for single and multi-session i-vector speaker recognition. arXiv preprint arXiv:1512.02560, 2015.
[9]
Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 253--256, May 2010.
[10]
Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In NIPS, 2012.
[11]
M. Pobar and I. Ipsić. Online speaker de-identification using voice transformation. In Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on, pages 1264--1267. IEEE, 2014.
[12]
T. Justin, V. Struc, S. Dobri;sek, B. Vesnicer, I. Ipsić, and F. Mihelic. Speaker de-identification using diphone recognition and speech synthesis. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, volume 4, pages 1--7. IEEE, 2015.
[13]
M. Dutta, C. Patgiri, M. Sarma, and K. K. Sarma. Closed-set text-independent speaker identification system using multiple ann classifiers. In Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014, pages 377--385. Springer, 2015.
[14]
G. Tesauro. Practical issues in temporal difference learning. In Machine Learning, pages 257--277, 1992.
[15]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278--2324, 1998.
[16]
Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 253--256, May 2010.
[17]
D. Reynolds. An overview of automatic speaker recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)(S. 4072-4075), 2002.
[18]
G. K. Verma. Multi-feature fusion for closed set text independent speaker identification. In International Conference on Information Intelligence, Systems, Technology and Management, pages 170--179. Springer, 2011.
[19]
C. Zhao, H. Wang, S. Hyon, J. Wei, and J. Dang. Efficient feature extraction of speaker identification using phoneme mean f-ratio for chinese. In Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on, pages 345--348. IEEE, 2012.
[20]
S. K. Sarangi and G. Saha. A novel approach in feature level for robust text-independent speaker identification system. In Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on, pages 1--5. IEEE, 2012.
[21]
S. R. Shahamiri and S. S Binti Salim. Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28 (1): 102--110, 2014
[22]
N. Sen and T. Basu. Features extracted using frequency-time analysis approach from nyquist filter bank and gaussian filter bank for text-independent speaker identification. In European Workshop on Biometrics and Identity Management, pages 125--136. Springer, 2011.
[23]
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4):788--798, 2011.
[24]
Y. Qian, T. Tan, D. Yu, and Y. Zhang. Integratedadaptation with multi-factor joint-learning for far-field speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5770--5774. IEEE, 2016.
[25]
K. Kumar, Q. Wu, Y. Wang, and M. Savvides. Noise robust speaker identification using bhattacharyya distance in adapted gaussian models space. In Signal Processing Conference, 2008 16th European, pages 1--4. IEEE, 2008.
[26]
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012
[27]
O. Ghahabi, A. Bonafonte, J. Hernando, and A. Moreno. Deep neural networks for i-vector language identification of short utterances in cars. Interspeech 2016, pages 367--371, 2016.
[28]
E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez. Deep neural networks for small footprint text-dependent speaker verification. In 2014 IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4052--4056. IEEE, 2014
[29]
K. Vesely, M. Karafiát, and F. Gŕezl. Convolutive bottleneck network features for lvcsr. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on, pages 42--47. IEEE, 2011.
[30]
P. Matejka, L. Zhang, T. Ng, S. H. Mallidi, O. Glembek, J. Ma, and B. Zhang. Neural network
[31]
F. Richardson, D. Reynolds, and N. Dehak. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10):1671--1675, 2015.
[32]
F. Richardson, D. Reynolds, and N. Dehak. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10):1671--1675, 2015.
[33]
V. Gupta, P. Kenny, P. Ouellet, and T. Stafylakis. I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6334--6338. IEEE, 2014.
[34]
Y. Liu, Y. Qian, N. Chen, T. Fu, Y. Zhang, and K. Yu. Deep feature for text-dependent speaker verification. Speech Communication, 73:1--13, 2015.
[35]
D. Garcia-Romero, X. Zhang, A. McCree, and D. Povey. Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. In Spoken Language Technology Workshop (SLT), 2014 IEEE, pages 378--383. IEEE, 2014.
[36]
O. Ghahabi and J. Hernando. Global impostor selection for dbns in multi-session i-vector speaker recognition. In Advances in Speech and Language Technologies for Iberian Languages, pages 89--98. Springer, 2014.
[37]
Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1695--1699. IEEE, 2014.
[38]
O. Ghahabi and J. Hernando. Deep belief networks for i-vector based speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1700--1704. IEEE, 2014.

Cited By

View all
  • (2024)Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy ReductionInformation10.3390/info1502011415:2(114)Online publication date: 15-Feb-2024
  • (2024)An investigation into the reliability of speaker recognition schemes: analysing the impact of environmental factors utilising deep learning techniquesJournal of Engineering and Applied Science10.1186/s44147-023-00351-071:1Online publication date: 6-Jan-2024
  • (2024)Deep Learning Approaches and Security Domains in Sentiment Analysis2024 First International Conference on Electronics, Communication and Signal Processing (ICECSP)10.1109/ICECSP61809.2024.10698274(1-6)Online publication date: 8-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems
November 2016
235 pages
ISBN:9781450347907
DOI:10.1145/3015166
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning
  2. feature extraction
  3. speaker identification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSPS 2016

Acceptance Rates

ICSPS 2016 Paper Acceptance Rate 46 of 83 submissions, 55%;
Overall Acceptance Rate 46 of 83 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy ReductionInformation10.3390/info1502011415:2(114)Online publication date: 15-Feb-2024
  • (2024)An investigation into the reliability of speaker recognition schemes: analysing the impact of environmental factors utilising deep learning techniquesJournal of Engineering and Applied Science10.1186/s44147-023-00351-071:1Online publication date: 6-Jan-2024
  • (2024)Deep Learning Approaches and Security Domains in Sentiment Analysis2024 First International Conference on Electronics, Communication and Signal Processing (ICECSP)10.1109/ICECSP61809.2024.10698274(1-6)Online publication date: 8-Aug-2024
  • (2024)Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech SignalCircuits, Systems, and Signal Processing10.1007/s00034-023-02542-943:3(1839-1861)Online publication date: 1-Mar-2024
  • (2023)Poster Abstract: Towards Speaker Identification on Resource-Constrained Embedded DevicesProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3628387(518-519)Online publication date: 12-Nov-2023
  • (2022)Use of Laughter for the Detection of Parkinson’s Disease: Feasibility Study for Clinical Decision Support Systems, Based on Speech Recognition and Automatic Classification TechniquesInternational Journal of Environmental Research and Public Health10.3390/ijerph19171088419:17(10884)Online publication date: 1-Sep-2022
  • (2022)Discriminative training of spiking neural networks organised in columns for stream‐based biometric authenticationIET Biometrics10.1049/bme2.1209911:5(485-497)Online publication date: 3-Oct-2022
  • (2022)An optimum end-to-end text-independent speaker identification system using convolutional neural networkComputers and Electrical Engineering10.1016/j.compeleceng.2022.107882100(107882)Online publication date: May-2022
  • (2021)Voice User Interface: Literature review, challenges and future directionsSYSTEM THEORY, CONTROL AND COMPUTING JOURNAL10.52846/stccj.2021.1.2.261:2(65-89)Online publication date: 31-Dec-2021
  • (2021)A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and OpportunitiesIEEE Access10.1109/ACCESS.2021.30842999(79236-79263)Online publication date: 2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media