Abstract
Speakers are generally identified by using features derived from the Fourier transform magnitude. The Modified group delay feature(MODGDF) derived from the Fourier transform phase has been used effectively for speaker recognition in our previous efforts.Although the efficacy of the MODGDF as an alternative to the MFCC is yet to be established, it has been shown in our earlier work that composite features derived from the MFCC and MODGDF perform extremely well. In this paper we investigate the cluster structures of speakers derived using the MODGDF in the lower dimensional feature space. Three non linear dimensionality reduction techniques The Sammon mapping, ISOMAP and LLE are used to visualize speaker clusters in the lower dimensional feature space. We identify the intrinsic dimensionality of both the MODGDF and MFCC using the Elbow technique. We also present the results of speaker identification experiments performed using MODGDF, MFCC and composite features derived from the MODGDF and MFCC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hegde, R.M., Murthy, H.A., Rao Gadde, V.R.: Application of the Modified Group Delay Function to Speaker Identification and Discrimination. In: Proceedings of the ICASSP 2004, May 2004, vol. 1, pp. 517–520 (2004)
Hegde, R.M., Murthy, H.A.: Speaker Identification using the modified group delay feature. In: Proceedings of The International Conference on Natural Language Processing ICON 2003, December 2003, pp. 159–167 (2003)
Murthy, H.A., Rao Gadde, V.R.: The Modified group delay function and its application to phoneme recognition. In: Proceedings of the ICASSP, April 2003, vol. I, pp. 68–71 (2003)
Sammon Jr., J.W.: A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers C-18(5), 401–409 (1969)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290(5500), 2319–2323 (2000), www.science.org
Roweis, S.T., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290(5500), 2323–2326 (2000), http://www.science.org
Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT: A Phonetically Balanced, Continuous Speech, Telephone Bandwidth Speech Database. In: Proceedings of ICASSP 1990 (April 1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hegde, R.M., Murthy, H.A. (2004). Cluster and Intrinsic Dimensionality Analysis of the Modified Group Delay Feature for Speaker Classification. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds) Neural Information Processing. ICONIP 2004. Lecture Notes in Computer Science, vol 3316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30499-9_182
Download citation
DOI: https://doi.org/10.1007/978-3-540-30499-9_182
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23931-4
Online ISBN: 978-3-540-30499-9
eBook Packages: Springer Book Archive