Skip to main content

Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks

  • Conference paper
  • First Online:
  • 698 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13068))

Abstract

Text-Independent Speaker Identification consists in finding out the identity of an individual using his/her voice independently of the content of the speech signal, that is, regardless the words uttered by the speaker. This problem is harder than Text-dependent speaker recognition where the speaker has to utter some specific word or phrase so he/she can be recognized. However, Text-Independent Speaker Identification is what we have to solve when the speaker has to be recognized without his/her collaboration as is frequently the case in many practical situations. Our proposal consists in searching within the speech signal for voiced speech content, which is the kind of speech produced when the vocal cords are vibrating. Once these segments of speech are identified, the formants are determined, formants are the resonance frequencies of the vocal tract. We use these formants to produce images which we believe should be different from one speaker to another, the way such images are built is original. Each image represent a specific speaker and so the problem of identifying speakers is turned into a problem of image recognition and we know how useful convolutional neural networks are for that purpose. For our experiments we used a collection of recordings from 21 individuals and achieved an accuracy of 92% outperforming the best results for text-independent identification published in recent works that used the same collection for testing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Almaadeed, N., Aggoun, A., Amira, A.: Text-independent speaker identification using vowel formants. J. Sig. Process. Syst. 82(3), 345–356 (2016). https://doi.org/10.1007/s11265-015-1005-5

    Article  Google Scholar 

  2. Atal, B.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976). https://doi.org/10.1109/PROC.1976.10155

    Article  Google Scholar 

  3. Besacier, L., Bonastre, J.F.: Subband architecture for automatic speaker recognition. Sig. Process. 80(7), 1245–1259 (2000)

    Article  Google Scholar 

  4. Bunrit, S., Inkian, T., Kerdprasop, N., Kerdprasop, K.: Text-independent speaker identification using deep learning model of convolution neural network. Int. J. Mach. Learn. Comput. 9, 143–148 (2019). https://doi.org/10.18178/ijmlc.2019.9.2.778

    Article  Google Scholar 

  5. Camarena-Ibarrola, A., Castro-Coria, M., Figueroa, K.: Cloud point matching for text-independent speaker identification. In: 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2018). https://doi.org/10.1109/ROPEC.2018.8661454

  6. Camarena-Ibarrola, A., Figueroa, K., García, J.: Speaker identification using entropygrams and convolutional neural networks. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds.) MICAI 2020. LNCS (LNAI), vol. 12468, pp. 23–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60884-2_2

    Chapter  Google Scholar 

  7. Camarena-Ibarrola, A., Luque, F., Chavez, E.: Speaker identification through spectral entropy analysis. In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2017). https://doi.org/10.1109/ROPEC.2017.8261607

  8. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  9. Lieberman, P., Blumstein, S.E.: Speech Physiology, Speech Perception, and Acoustic Phonetics. Cambridge University Press, Cambridge (1988)

    Book  Google Scholar 

  10. Luque-Suárez, F., Camarena-Ibarrola, A., Chávez, E.: Efficient speaker identification using spectral entropy. Multimed. Tools Appl. 78(12), 16803–16815 (2019). https://doi.org/10.1007/s11042-018-7035-9

    Article  Google Scholar 

  11. Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Process. 7(5), 569–586 (1999)

    Article  Google Scholar 

  12. Rosenberg, A.: Automatic speaker verification: a review. Proc. IEEE 64(4), 475–487 (1976). https://doi.org/10.1109/PROC.1976.10156

    Article  Google Scholar 

  13. Snell, R., Milinazzo, F.: Formant location from LPC analysis data. IEEE Trans. Speech Audio Process. 1(2), 129–134 (1993). https://doi.org/10.1109/89.222882

    Article  Google Scholar 

  14. Taseer, S.K.: Speaker identification for speakers with deliberately disguised voices using glottal pulse information. In: 2005 Pakistan Section Multitopic Conference, pp. 1–5 (2005). https://doi.org/10.1109/INMIC.2005.334384

  15. Thévenaz, P., Hügli, H.: Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)

    Article  Google Scholar 

  16. Yu, J.C., Zhang, R.L.: Speaker recognition method using MFCC and LPCC features. Comput. Eng. Des. 30(5), 1189–1191 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Camarena-Ibarrola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Camarena-Ibarrola, A., Reynoso, M., Figueroa, K. (2021). Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89820-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89819-9

  • Online ISBN: 978-3-030-89820-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics