Abstract
We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, B., Zhu, Q., Morgan, N.: Learning long-term temporal features in LVCSR using neural networks. In: ICSLP (2004)
Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using MLP features in LVCSR. In: ICSLP (2004)
Heck, L.P., Konig, Y., Sönmez, M.K., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communications 31(2-3), 181–192 (2000)
Konig, Y., Heck, L., Weintraub, M., Sönmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proceedings of RLA2C - Speaker Recognition and Its Commercial and Forensic Applications, Avignon, France (1998)
Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: IEEE Int. Carnahan Conf. on Security Technology (2005)
Wu, D., Morris, A., Koreman, J.: MLP internal representation as discriminative features for improved speaker recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 25–33. Springer, Heidelberg (2006)
Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A.: MLLR transforms as features in speaker recognition. In: EUROSPEECH 2005, pp. 2425–2428 (2005)
Andrews, W., Kohler, M., Campbell, J.: Phonetic speaker recognition. In: Eurospeech, pp. 149–153 (2001)
Kajarekar, S., Ferrer, L., Shriberg, E., Sönmez, K., Stolcke, A., Venkataraman, A., Zheng, J.: SRI’s 2004 NIST speaker recognition evaluation system. In: ICASSP, vol. 1, pp. 173–176 (2005)
Cieri, C., Miller, D., Walker, K.: The Fisher corpus: a resource for the next generations of speech to text. In: LREC, pp. 69–71 (2004)
Linguistic Data Consortium, Switchboard-2 corpora, http://www.ldc.upenn.edu
Johnson, D.: QuickNet3 (2004), http://www.icsi.berkeley.edu/Speech/qn.html
MIT Lincoln Labs, LNKNet (2005), http://www.ll.mit.edu/IST/lnknet
National Institute of Standards and Technology, The NIST year 2004 speaker recognition evaluation plan (2004), http://www.nist.gov/speech/tests/spk/2004/SRE-04_evalplan-v1a.pdf
Stoll, L.: Phonetic- and speaker-discriminant features for speaker recognition, Master’s thesis, University of California at Berkeley (December 2006), http://www.icsi.berkeley.edu/~lstoll/publications/stoll_masters_dec2006.pdf
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10, 42–54 (2000)
Reynolds, D.: Channel robust speaker verification via feature mapping. In: ICASSP (2003)
Teunen, R., Shahshahani, B., Heck, L.: A model-based transformational approach to robust speaker recognition. In: ICSLP (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stoll, L., Frankel, J., Mirghafori, N. (2007). Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-77347-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)