Abstract
In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier.
Similar content being viewed by others
Notes
In the present study, N k is not normalized, as this will not affect its use in (14).
Since the number of examples for each of the phonemes used in the work is less, product of likelihood-Gaussians used in the feature-level approach cannot be used.
References
Arslan, L. M., & Hansen, J. H. L. (1999). Selective training for hidden Markov models with applications to speech classification. IEEE Transactions on Speech and Audio Processing, 7(1), 46–54.
Arun Kumar, C., Bharathi, B., & Nagarajan, T. (2009). A discriminative GMM technique using product of likelihood Gaussians. In IEEE TENCON (pp. 1–6).
Bharathi, B., Vijayalakshmi, P., & Nagarajan, T. (2011). Speaker identification using utterances correspond to speaker-specific-text. In IEEE students technology symposium (Techsym) (pp. 171–174).
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. In Linguistic data consortium, Philadelphia, USA.
Jankowski, C., et al. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. of ICASSP (pp. 109–112).
Liu, C.-S., Lee, C.-H., Juang, B.-H., & Rosenberg, A. E. (1994). Speaker recognition based on minimum error discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 325–328).
Nagarajan, T., & O’Shaughnessy, D. (2006). Discriminative MLE training using a product of Gaussian likelihoods. In INTERSPEECH—2006, Pittsburgh, Pennsylvania, USA (pp. 601–604).
Nagarajan, T., & O’Shaughnessy, D. (2007). Bias estimation and correction in a classifier using product of likelihood-gaussians. In ICASSP, Hawaii, USA (pp. 1061–1064).
Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83.
Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bharathi, B., Arun Kumar, C. & Nagarajan, T. Improving the performance of speaker and language identification tasks using unique characteristics of a class. Int J Speech Technol 16, 115–124 (2013). https://doi.org/10.1007/s10772-012-9167-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9167-z