Skip to main content
Log in

Maximum entropy PLDA for robust speaker recognition under speech coding distortion

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics,22, 39–71.

    Google Scholar 

  • Bousquet, P. M., Bonastre, J. F., & Matrouf, D. (2014). Exploring some limits of Gaussian PLDA modeling for i-vector distributions.

  • Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.

    Article  Google Scholar 

  • Chen, X., Zhang, J., Anastasakos, T., & Alleva, F. (2019). Investigation of sampling techniques for maximum entropy language modeling training. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7240–7244). IEEE.

  • Chien-Lin, H., & Bin, M. A. (2011). Maximum entropy based data selection for speaker recognition. In Twelfth Annual Conference of the International Speech Communication Association.

  • Chilli, A. K., Kumar, K. P., Murthy, H. A., & Sekhar, C. C. (2018). Approaches to codec independent speaker identification in voip speech. In 2018 Twenty Fourth National Conference on Communications (NCC) (pp. 1–5). IEEE.

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing,19(4), 788–798.

    Article  Google Scholar 

  • Dunn, R. B, Quatieri, T. F., Reynolds, D. A., & Campbell, J. P. (2001). Speaker recognition from coded speech and the effects of score normalization. In Proceedings of Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Vol. 2, pp. 1562–1567).

  • Gallardo, L. F. (2016). Human and automatic speaker recognition over telecommunication channels. Singapore: Springer.

    Book  Google Scholar 

  • Gallardo, L. F., Wagner, M., & Möller, S. (2014). i-vector speaker verification for speech degraded by narrowband and wideband channels. In Speech Communication; 11. ITG Symposium (pp. 1–4). VDE.

  • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., & Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • Gibson, J. D. (2005). Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine,5(4), 30–49.

    Article  Google Scholar 

  • Goodman, J. (2001). Classes for fast maximum entropy training. arXiv preprint cs/0108006.

  • Grassi, S., Besacier, L., Dufaux, A., Ansorge, M., & Pellandini, F. (2000). Influence of GSM speech coding on the performance of text-independent speaker recognition. In Proc. of European Signal Processing Conference (EUSIPCO) (pp. 437–440). Tampere, Finland, September 4–8.

  • Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32, 74–79.

    Article  Google Scholar 

  • Hayes, B. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry,39(3), 379–440.

    Article  Google Scholar 

  • Huang, C.L., & Ma, B. (2011). Maximum entropy based data selection for speaker recognition. In Proceeding of: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy, August 27–31.

  • Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S. (2012). PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop.

  • Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Odyssey (Vol. 14).

  • Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29.

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication,52, 12–40.

    Article  Google Scholar 

  • Krobba, A., Debyeche, M., & Amrouche, A. (2010). Evaluation of speaker identification system using GSM-EFR speech data. In Proceedings of International Conference on Design and Technology of Integrated Systems (Nanoscale Era Hammamet) (pp. 1–5).

  • Krobba, A, Debyeche, M., & Selouani, S. A. (2017) Combining acoustic distinctive cues and GFCCs features for robust speaker recognition under speech coding distortion. International Journal of Electrical Electronics & Computer Science Engineering, 4(6).

  • McCree, A. (2006). Reducing speech coding distortion for speaker identification. In Annual Conference (Interspeech) (pp. 941–944).

  • McLaren, M., Abrash, V., Graciarena, M., Lei, Y., & Pes’an, J. (2013). Improving robustness to compressed speech in speaker recognition. In Proceedings of INTERSPEECH (pp. 3698–3702).

  • Pawar, R. V., Kajave, P. P., & Mali, S. N. (2005). Speaker identification using neural networks. In IEC (Prague) (pp. 429–433).

  • Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. ISBN: 978-0-470-02400-3.

  • Phythian, M., Ingram, J., & Sridharan, S. (1997). Effects of speech coding on text-dependent speaker recognition. In Proceedings of IEEE TENCON ‘97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON) (Vol. 1, pp. 137–140).

  • Polacky, J., Jarina, R., & Chumlık, M. (2016). Assessment of automatic speaker verification on lossy transcoded speech. In Proceedings of 4th International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6).

  • Prince, S. J. D., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In Proceedings of International Conference on Computer Vision (pp. 1–8).

  • Quatieri, T. F., Dunn, R. B., Reynolds, D. A., Campbell, J. P., & Singer, E. (2000). Speaker recognition using G.729 speech codec parameters. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,2, 1089–1092.

    Google Scholar 

  • Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing,21(5), 1012–1022.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing,10(1–3), 19–41.

    Article  Google Scholar 

  • Sreenivasa, K. R., & Anil Kumar, V. (2014). Speech processing in mobile environments. Switzerland: Springer.

    Google Scholar 

  • Uffink, J. (1996). The constraint rule of the maximum entropy principle. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics,27(1), 47–79.

    Article  MathSciNet  Google Scholar 

  • Variani, E, Lei, X., McDermott, E., Lopez Moreno, I., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In Proceedings of ICASSP (pp. 4052–4056).

  • Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2010). Effect of speech coding on speaker identification (pp. 1–4, 17–19). Annual IEEE India Conference (INDICON).

  • Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Improved speaker identification in wireless environment. International Journal of Signal and Imaging Systems Engineering,6(3), 130–137.

    Article  Google Scholar 

  • Young, S., & Odell, J. (2005). The HTK book version 3.3. Speech group, Engineering Department, Cambridge University.

  • Zhang, L. E. (2004). Maximum entropy modeling toolkit for python and C++. Shenyang: Natural Language Processing Lab, Northeastern University.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Krobba.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krobba, A., Debyeche, M. & Selouani, S.A. Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int J Speech Technol 22, 1115–1122 (2019). https://doi.org/10.1007/s10772-019-09642-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09642-5

Keywords

Navigation