Efficient speaker identification using spectral entropy

Luque-Suárez, Fernando; Camarena-Ibarrola, Antonio; Chávez, Edgar

doi:10.1007/s11042-018-7035-9

Efficient speaker identification using spectral entropy

Published: 02 January 2019

Volume 78, pages 16803–16815, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fernando Luque-Suárez¹,
Antonio Camarena-Ibarrola² &
Edgar Chávez ORCID: orcid.org/0000-0002-0148-695X¹

331 Accesses
6 Citations
Explore all metrics

Abstract

In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Milestones in speaker recognition

Article Open access 15 February 2024

References

Beltrán J, Chávez E, Favela J (2015) Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. Pattern Recogn Lett 68:153–160
Article Google Scholar
Bernhardsson E Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. https://github.com/spotify/annoy
Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE transactions on acoustics, speech, and signal processing, vol 28, pp 357–366
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. In: IEEE transactions on audio, speech and language processing, vol 19. pp 788–798
Greenberg C, Bansé D (2014) The NIST 2014 speaker recognition i-vector machine learning challenge. In: Proc the speaker and language recognition workshop, pp 224–230
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Proc Mag 32(6):74–99
Article Google Scholar
Kenny P (2005) Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, pp 1–17
Kenny P, Mihoubi M, Dumouchel P (2003) New MAP estimators for speaker recognition. Interspeech, pp 1–4
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40
Article Google Scholar
Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104
Article Google Scholar
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing: A Review Journal 10(1):19–41
Article Google Scholar
Schmidt L (2014) Large scale speaker identification. In: 2014 IEEE international conference on acoustic, speech and signal processing (ICASSP), pp 1669–1673
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1):3
Article MathSciNet Google Scholar
Snyder D, Garcia-romero D, Povey D (2015) Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, pp 92–97
Uhlmann JK (1991) Satisfying general proximity / similarity queries with metric trees. Inf Process Lett 40:175–179
Article MATH Google Scholar
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. Annual ACM-SIAM Symposium on Discrete Algorithms, pp 311–321

Download references

Author information

Authors and Affiliations

CICESE, Ensenada, Mexico
Fernando Luque-Suárez & Edgar Chávez
Universidad Michoacana, Morelia, Mexico
Antonio Camarena-Ibarrola

Authors

Fernando Luque-Suárez
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Camarena-Ibarrola
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Chávez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Luque-Suárez.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luque-Suárez, F., Camarena-Ibarrola, A. & Chávez, E. Efficient speaker identification using spectral entropy. Multimed Tools Appl 78, 16803–16815 (2019). https://doi.org/10.1007/s11042-018-7035-9

Download citation

Received: 23 April 2018
Revised: 08 November 2018
Accepted: 07 December 2018
Published: 02 January 2019
Issue Date: 30 June 2019
DOI: https://doi.org/10.1007/s11042-018-7035-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient speaker identification using spectral entropy

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient speaker identification using spectral entropy

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation