Abstract
The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics,22, 39–71.
Bousquet, P. M., Bonastre, J. F., & Matrouf, D. (2014). Exploring some limits of Gaussian PLDA modeling for i-vector distributions.
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.
Chen, X., Zhang, J., Anastasakos, T., & Alleva, F. (2019). Investigation of sampling techniques for maximum entropy language modeling training. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7240–7244). IEEE.
Chien-Lin, H., & Bin, M. A. (2011). Maximum entropy based data selection for speaker recognition. In Twelfth Annual Conference of the International Speech Communication Association.
Chilli, A. K., Kumar, K. P., Murthy, H. A., & Sekhar, C. C. (2018). Approaches to codec independent speaker identification in voip speech. In 2018 Twenty Fourth National Conference on Communications (NCC) (pp. 1–5). IEEE.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing,19(4), 788–798.
Dunn, R. B, Quatieri, T. F., Reynolds, D. A., & Campbell, J. P. (2001). Speaker recognition from coded speech and the effects of score normalization. In Proceedings of Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Vol. 2, pp. 1562–1567).
Gallardo, L. F. (2016). Human and automatic speaker recognition over telecommunication channels. Singapore: Springer.
Gallardo, L. F., Wagner, M., & Möller, S. (2014). i-vector speaker verification for speech degraded by narrowband and wideband channels. In Speech Communication; 11. ITG Symposium (pp. 1–4). VDE.
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., & Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Gibson, J. D. (2005). Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine,5(4), 30–49.
Goodman, J. (2001). Classes for fast maximum entropy training. arXiv preprint cs/0108006.
Grassi, S., Besacier, L., Dufaux, A., Ansorge, M., & Pellandini, F. (2000). Influence of GSM speech coding on the performance of text-independent speaker recognition. In Proc. of European Signal Processing Conference (EUSIPCO) (pp. 437–440). Tampere, Finland, September 4–8.
Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32, 74–79.
Hayes, B. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry,39(3), 379–440.
Huang, C.L., & Ma, B. (2011). Maximum entropy based data selection for speaker recognition. In Proceeding of: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy, August 27–31.
Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S. (2012). PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop.
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Odyssey (Vol. 14).
Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication,52, 12–40.
Krobba, A., Debyeche, M., & Amrouche, A. (2010). Evaluation of speaker identification system using GSM-EFR speech data. In Proceedings of International Conference on Design and Technology of Integrated Systems (Nanoscale Era Hammamet) (pp. 1–5).
Krobba, A, Debyeche, M., & Selouani, S. A. (2017) Combining acoustic distinctive cues and GFCCs features for robust speaker recognition under speech coding distortion. International Journal of Electrical Electronics & Computer Science Engineering, 4(6).
McCree, A. (2006). Reducing speech coding distortion for speaker identification. In Annual Conference (Interspeech) (pp. 941–944).
McLaren, M., Abrash, V., Graciarena, M., Lei, Y., & Pes’an, J. (2013). Improving robustness to compressed speech in speaker recognition. In Proceedings of INTERSPEECH (pp. 3698–3702).
Pawar, R. V., Kajave, P. P., & Mali, S. N. (2005). Speaker identification using neural networks. In IEC (Prague) (pp. 429–433).
Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. ISBN: 978-0-470-02400-3.
Phythian, M., Ingram, J., & Sridharan, S. (1997). Effects of speech coding on text-dependent speaker recognition. In Proceedings of IEEE TENCON ‘97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON) (Vol. 1, pp. 137–140).
Polacky, J., Jarina, R., & Chumlık, M. (2016). Assessment of automatic speaker verification on lossy transcoded speech. In Proceedings of 4th International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6).
Prince, S. J. D., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In Proceedings of International Conference on Computer Vision (pp. 1–8).
Quatieri, T. F., Dunn, R. B., Reynolds, D. A., Campbell, J. P., & Singer, E. (2000). Speaker recognition using G.729 speech codec parameters. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,2, 1089–1092.
Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing,21(5), 1012–1022.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing,10(1–3), 19–41.
Sreenivasa, K. R., & Anil Kumar, V. (2014). Speech processing in mobile environments. Switzerland: Springer.
Uffink, J. (1996). The constraint rule of the maximum entropy principle. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics,27(1), 47–79.
Variani, E, Lei, X., McDermott, E., Lopez Moreno, I., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In Proceedings of ICASSP (pp. 4052–4056).
Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2010). Effect of speech coding on speaker identification (pp. 1–4, 17–19). Annual IEEE India Conference (INDICON).
Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Improved speaker identification in wireless environment. International Journal of Signal and Imaging Systems Engineering,6(3), 130–137.
Young, S., & Odell, J. (2005). The HTK book version 3.3. Speech group, Engineering Department, Cambridge University.
Zhang, L. E. (2004). Maximum entropy modeling toolkit for python and C++. Shenyang: Natural Language Processing Lab, Northeastern University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Krobba, A., Debyeche, M. & Selouani, S.A. Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int J Speech Technol 22, 1115–1122 (2019). https://doi.org/10.1007/s10772-019-09642-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-019-09642-5