Skip to main content
Log in

Efficient speaker identification using spectral entropy

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Beltrán J, Chávez E, Favela J (2015) Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. Pattern Recogn Lett 68:153–160

    Article  Google Scholar 

  2. Bernhardsson E Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. https://github.com/spotify/annoy

  3. Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE transactions on acoustics, speech, and signal processing, vol 28, pp 357–366

  4. Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. In: IEEE transactions on audio, speech and language processing, vol 19. pp 788–798

  5. Greenberg C, Bansé D (2014) The NIST 2014 speaker recognition i-vector machine learning challenge. In: Proc the speaker and language recognition workshop, pp 224–230

  6. Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Proc Mag 32(6):74–99

    Article  Google Scholar 

  7. Kenny P (2005) Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, pp 1–17

  8. Kenny P, Mihoubi M, Dumouchel P (2003) New MAP estimators for speaker recognition. Interspeech, pp 1–4

  9. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40

    Article  Google Scholar 

  10. Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104

    Article  Google Scholar 

  11. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  12. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing: A Review Journal 10(1):19–41

    Article  Google Scholar 

  13. Schmidt L (2014) Large scale speaker identification. In: 2014 IEEE international conference on acoustic, speech and signal processing (ICASSP), pp 1669–1673

  14. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1):3

    Article  MathSciNet  Google Scholar 

  15. Snyder D, Garcia-romero D, Povey D (2015) Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, pp 92–97

  16. Uhlmann JK (1991) Satisfying general proximity / similarity queries with metric trees. Inf Process Lett 40:175–179

    Article  MATH  Google Scholar 

  17. Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. Annual ACM-SIAM Symposium on Discrete Algorithms, pp 311–321

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando Luque-Suárez.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luque-Suárez, F., Camarena-Ibarrola, A. & Chávez, E. Efficient speaker identification using spectral entropy. Multimed Tools Appl 78, 16803–16815 (2019). https://doi.org/10.1007/s11042-018-7035-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-7035-9

Keywords

Navigation