Skip to main content

Discriminating Speakers by Their Voices — A Fusion Based Approach

  • Conference paper
  • First Online:
Book cover Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

  • 2176 Accesses

Abstract

The task of Speaker Discrimination (SD) consists in checking whether two speech segments belong to the same speaker or not. In this research field, it is often difficult to decide what could be the best classifier in terms of accuracy and robustness. For that purpose, we have implemented 9 classifiers: Support Vector Machines, Linear Discriminant Analysis, Multi-Layer Perceptron, Generalized Linear Model, Self Organizing Map, Adaboost, Second Order Statistical Measures, Linear Regression and Gaussian Mixture Models. Furthermore, a new fusion approach is proposed and experimented in speaker discrimination. Several experiments of speaker discrimination were conducted on Hub4 Broadcast-News with relatively short segments. The obtained results have shown that the best classifier is the SVM and that the proposed fusion approach is quite interesting since it provided the best performances at all.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rose, P.: Forensic speaker discrimination with australian english vowel acoustics. In: ICPhS XVI Saarbrücken, pp. 6–10 (2007)

    Google Scholar 

  2. Matrouf, D., Bonastre, J.F.: Accurate Log-Likelihood Ratio Estimation By Using Test Statistical Model For Speaker Verification. In: The Speaker and Language Recognition Workshop (2006)

    Google Scholar 

  3. Meignier, S., et al.: Step-by-step and integrated approaches in broadcast news speaker diarization. Comput. Speech Lang. 20, 303–330 (2006)

    Article  Google Scholar 

  4. Ouamour, S., Guerti, M., Sayoud, H.: A New Relativistic Vision in Speaker Discrimination. Can. Acoust. J. 36(4), 24–34 (2008)

    Google Scholar 

  5. Li, M., Xing, Y., Luo, R.: Hierarchical Speaker Verification Based on PCA and Kernel Fisher Discriminant. In: Fourth International Conference on Natural Computation, pp. 152–156 (2008)

    Google Scholar 

  6. Zhao, Z.D., Zhang, J., Tian, J.F., Lou, Y.Y.: An effective identification method for speaker recognition based on PCA and double VQ. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, pp. 1686–1689 (2009)

    Google Scholar 

  7. Jayakurnar, A., Vimal, K.V.R., Babu Anto, P.: Text dependent speaker recognition using discrete stationary wavelet transform and PCA. In: International Conference on the Current Trends in Information Technology (CTIT), pp. 1–4 (2009)

    Google Scholar 

  8. Zhou, Y., Zhang, X., Wang, J., Gong, Y.: Research on speaker feature dimension reduction based on CCA and PCA. In: International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–4 (2010)

    Google Scholar 

  9. Mehra, A., Kumawat, M., Ranjan, R.: Expert system for speaker identification using lip features with PCA. In: 2nd International Workshop on Intelligent Systems and Applications (ISA), pp. 1–4 (2010)

    Google Scholar 

  10. Xiao-Chun, L., Jun-Xun, Y.: A text-independent speaker recognition system based on probabilistic principle component analysis. In: 2012 3rd International Conference on System Science, Engineering Design and Manufacturing Informatization, pp. 255–260 (2012)

    Google Scholar 

  11. Jing, X., Ma, J., Zhao, J., Yang, H.: Speaker recognition based on principal component analysis of LPCC and MFCC, pp. 403–408. IEEE (2014)

    Google Scholar 

  12. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S-PLUS, 4th edn. Springer, New York (2002)

    Book  MATH  Google Scholar 

  13. Ruihi, W..: AdaBoost for feature selection, classification and its relation with SVM, a review. In: International Conference on Solid State Devices and Materials Science, 1–2, April 2012, vol. 25, pp. 800–807. Physics Procedia, Macao (2012)

    Google Scholar 

  14. Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In Kasabov, N., Ko, K. (eds.) Proceedings of the ICONIP/ANZIIS/ANNES 1999 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pp. 192–196 (1999)

    Google Scholar 

  15. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput. 13, 637–649 (2001)

    Article  MATH  Google Scholar 

  16. Sayoud, H.: Automatic speaker recognition–Connexionnist approach. PhD thesis, USTHB University, Algiers (2003)

    Google Scholar 

  17. Wikipedia, “Linear regression”, From Wikipedia, the free encyclopedia. The web page was last modified on 28 March (2013), http://en.wikipedia.org/wiki/Linear_regression

  18. Huang, X., Pan, W.: Linear regression and two-class classification with gene expression data. Bioinformatics 19(16), 2072–2078 (2003)

    Article  Google Scholar 

  19. Wang, X., Fan, J.: Variable selection for multivariate generalized linear models. J. Appl. Stat. 41(2) (2014)

    Google Scholar 

  20. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990). doi:10.1109/5.58325

    Article  Google Scholar 

  21. Tambouratzis, G., Hairetakis, G., Markantonatou, S., Carayannis, G.: Applying the SOM model to text classification according to register and stylistic content. Int. J. Neural Syst. 13(1), 1–11 (2003)

    Article  Google Scholar 

  22. McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41(3–4), 379–388 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Přibil, J., Přibilová, A., Matoušek, J.: GMM classification of text-to-speech synthesis: identification of original speaker’s voice. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 365–373. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_44

    Google Scholar 

  24. Shlens, J.: A Tutorial on Principal Component Analysis–Derivation, Discussion and Singular Value Decomposition. Version number 1 (2003), www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

  25. Shayegan, M.A., Aghabozorgi, S.: A new dataset size reduction approach for PCA-based classification in OCR application. Math. Prob. Eng. 2014, 14 (2014), http://dx.doi.org/10.1155/2014/537428

  26. Dasarathy, B.V.: Decision fusion. In: Proceedings of IEEE Computer Society Press, Los Alamitos, CA (1994)

    Google Scholar 

  27. Verlinde, P.: Contribution à la vérification multimodale d’identité en utilisant la fusion de decisions. PhD thesis, Ecole Nationale Supérieure des Télécommunications, Paris, France, 17 September (1999)

    Google Scholar 

  28. Jain, A.K., Ross, A., Prabhakar, S.: An Introduction to Biometric Recognition. J. IEEE Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Halim Sayoud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sayoud, H., Ouamour, S., Hamadache, Z. (2017). Discriminating Speakers by Their Voices — A Fusion Based Approach. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics