Abstract
For wireless remote access security, there is an emerging need for biometric speaker identification systems (SID) to be robust to speech coding distortion. This paper presents results on a Gaussian mixture model (GMM) based SID system that is trained on clean speech and tested on the decoded speech of the G.729 codec. To mitigate the performance loss due to mismatched training and testing conditions, five robust features, two enhancement approaches and three fusion strategies are used. The first enhancement method is feature compensation based on the affine transform. The second is the McCree signal enhancement approach based on the spectral envelope information in the G.729 bit stream. Ensemble systems using decision level, score fusion and Borda count are studied. The best performance is obtained by performing signal enhancement, feature compensation and decision level fusion. This results in an identification success rate (ISR) of 89.8%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A.K., Ross, A., Nandakumar, K.: Introduction to Biometrics. Springer (2011)
Togneri, R., Pullella, D.: An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 23–61 (2011)
Fazel, A., Chakrabartty, S.: An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits and Systems Magazine, 62–81 (2011)
Campbell, J.P., Shen, W., Campbell, W.M., Schwartz, R., Bonastre, J.-F., Matrouf, D.: Forensic speaker recognition. IEEE Signal Proc. Mag., 95–103 (2009)
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust speaker recognition - A feature based approach. IEEE Signal Proc. Mag., 58–71 (1996)
ITU-T: Recommendation G.729 - coding of speech at 8 kbit/s using conjugate-structure algebraic-code-exited linear prediction, CS-ACELP (2007)
Moreno-Daniel, A., Juang, B.-H., Nolazco-Flores, J.A.: Robustness of bit-stream based features for speaker verification. In: IEEE Int. Conf. on Acoustics, Speech and Signal Proc., pp. I-749–I-752 (2005)
McCree, A.: Reducing Speech Coding Distortion for Speaker Identification. In: IEEE Int. Conf. on Spoken Language Proc. (2006)
Zilovic, M.S., Ramachandran, R.P., Mammone, R.J.: Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. on Speech and Audio Proc., 260–267 (1998)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 21–45 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raval, K., Ramachandran, R.P., Shetty, S.S., Smolenski, B.Y. (2012). Feature and Signal Enhancement for Robust Speaker Identification of G.729 Decoded Speech. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34500-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-34500-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34499-2
Online ISBN: 978-3-642-34500-5
eBook Packages: Computer ScienceComputer Science (R0)