Maximum entropy PLDA for robust speaker recognition under speech coding distortion

Krobba, Ahmed; Debyeche, Mohamed; Selouani, Sid. Ahmed

doi:10.1007/s10772-019-09642-5

Maximum entropy PLDA for robust speaker recognition under speech coding distortion

Published: 21 October 2019

Volume 22, pages 1115–1122, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Ahmed Krobba¹,
Mohamed Debyeche¹ &
Sid. Ahmed Selouani²

221 Accesses
12 Citations
Explore all metrics

Abstract

The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

Speaker Verification from Codec-Distorted Speech Through Combination of Affine Transform and Feature Switching

Article 14 June 2021

Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification

Article 10 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics,22, 39–71.
Google Scholar
Bousquet, P. M., Bonastre, J. F., & Matrouf, D. (2014). Exploring some limits of Gaussian PLDA modeling for i-vector distributions.
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.
Article Google Scholar
Chen, X., Zhang, J., Anastasakos, T., & Alleva, F. (2019). Investigation of sampling techniques for maximum entropy language modeling training. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7240–7244). IEEE.
Chien-Lin, H., & Bin, M. A. (2011). Maximum entropy based data selection for speaker recognition. In Twelfth Annual Conference of the International Speech Communication Association.
Chilli, A. K., Kumar, K. P., Murthy, H. A., & Sekhar, C. C. (2018). Approaches to codec independent speaker identification in voip speech. In 2018 Twenty Fourth National Conference on Communications (NCC) (pp. 1–5). IEEE.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing,19(4), 788–798.
Article Google Scholar
Dunn, R. B, Quatieri, T. F., Reynolds, D. A., & Campbell, J. P. (2001). Speaker recognition from coded speech and the effects of score normalization. In Proceedings of Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Vol. 2, pp. 1562–1567).
Gallardo, L. F. (2016). Human and automatic speaker recognition over telecommunication channels. Singapore: Springer.
Book Google Scholar
Gallardo, L. F., Wagner, M., & Möller, S. (2014). i-vector speaker verification for speech degraded by narrowband and wideband channels. In Speech Communication; 11. ITG Symposium (pp. 1–4). VDE.
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., & Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Google Scholar
Gibson, J. D. (2005). Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine,5(4), 30–49.
Article Google Scholar
Goodman, J. (2001). Classes for fast maximum entropy training. arXiv preprint cs/0108006.
Grassi, S., Besacier, L., Dufaux, A., Ansorge, M., & Pellandini, F. (2000). Influence of GSM speech coding on the performance of text-independent speaker recognition. In Proc. of European Signal Processing Conference (EUSIPCO) (pp. 437–440). Tampere, Finland, September 4–8.
Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32, 74–79.
Article Google Scholar
Hayes, B. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry,39(3), 379–440.
Article Google Scholar
Huang, C.L., & Ma, B. (2011). Maximum entropy based data selection for speaker recognition. In Proceeding of: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy, August 27–31.
Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S. (2012). PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop.
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Odyssey (Vol. 14).
Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication,52, 12–40.
Article Google Scholar
Krobba, A., Debyeche, M., & Amrouche, A. (2010). Evaluation of speaker identification system using GSM-EFR speech data. In Proceedings of International Conference on Design and Technology of Integrated Systems (Nanoscale Era Hammamet) (pp. 1–5).
Krobba, A, Debyeche, M., & Selouani, S. A. (2017) Combining acoustic distinctive cues and GFCCs features for robust speaker recognition under speech coding distortion. International Journal of Electrical Electronics & Computer Science Engineering, 4(6).
McCree, A. (2006). Reducing speech coding distortion for speaker identification. In Annual Conference (Interspeech) (pp. 941–944).
McLaren, M., Abrash, V., Graciarena, M., Lei, Y., & Pes’an, J. (2013). Improving robustness to compressed speech in speaker recognition. In Proceedings of INTERSPEECH (pp. 3698–3702).
Pawar, R. V., Kajave, P. P., & Mali, S. N. (2005). Speaker identification using neural networks. In IEC (Prague) (pp. 429–433).
Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. ISBN: 978-0-470-02400-3.
Phythian, M., Ingram, J., & Sridharan, S. (1997). Effects of speech coding on text-dependent speaker recognition. In Proceedings of IEEE TENCON ‘97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON) (Vol. 1, pp. 137–140).
Polacky, J., Jarina, R., & Chumlık, M. (2016). Assessment of automatic speaker verification on lossy transcoded speech. In Proceedings of 4th International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6).
Prince, S. J. D., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In Proceedings of International Conference on Computer Vision (pp. 1–8).
Quatieri, T. F., Dunn, R. B., Reynolds, D. A., Campbell, J. P., & Singer, E. (2000). Speaker recognition using G.729 speech codec parameters. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,2, 1089–1092.
Google Scholar
Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing,21(5), 1012–1022.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing,10(1–3), 19–41.
Article Google Scholar
Sreenivasa, K. R., & Anil Kumar, V. (2014). Speech processing in mobile environments. Switzerland: Springer.
Google Scholar
Uffink, J. (1996). The constraint rule of the maximum entropy principle. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics,27(1), 47–79.
Article MathSciNet Google Scholar
Variani, E, Lei, X., McDermott, E., Lopez Moreno, I., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In Proceedings of ICASSP (pp. 4052–4056).
Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2010). Effect of speech coding on speaker identification (pp. 1–4, 17–19). Annual IEEE India Conference (INDICON).
Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Improved speaker identification in wireless environment. International Journal of Signal and Imaging Systems Engineering,6(3), 130–137.
Article Google Scholar
Young, S., & Odell, J. (2005). The HTK book version 3.3. Speech group, Engineering Department, Cambridge University.
Zhang, L. E. (2004). Maximum entropy modeling toolkit for python and C++. Shenyang: Natural Language Processing Lab, Northeastern University.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Communication and Signal Processing Laboratory Université des Sciences et de la Technologie Houari Boumediene (USTHB), Algiers, Algeria
Ahmed Krobba & Mohamed Debyeche
LARIHS Laboratory, Campus Shappaing, University of Moncton, Moncton, Canada
Sid. Ahmed Selouani

Authors

Ahmed Krobba
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author inPubMed Google Scholar
Sid. Ahmed Selouani
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ahmed Krobba.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krobba, A., Debyeche, M. & Selouani, S.A. Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int J Speech Technol 22, 1115–1122 (2019). https://doi.org/10.1007/s10772-019-09642-5

Download citation

Received: 18 March 2019
Accepted: 26 September 2019
Published: 21 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10772-019-09642-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum entropy PLDA for robust speaker recognition under speech coding distortion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

Speaker Verification from Codec-Distorted Speech Through Combination of Affine Transform and Feature Switching

Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now