Skip to main content
Log in

Improving the performance of speaker and language identification tasks using unique characteristics of a class

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In the present study, N k is not normalized, as this will not affect its use in (14).

  2. Since the number of examples for each of the phonemes used in the work is less, product of likelihood-Gaussians used in the feature-level approach cannot be used.

References

  • Arslan, L. M., & Hansen, J. H. L. (1999). Selective training for hidden Markov models with applications to speech classification. IEEE Transactions on Speech and Audio Processing, 7(1), 46–54.

    Article  Google Scholar 

  • Arun Kumar, C., Bharathi, B., & Nagarajan, T. (2009). A discriminative GMM technique using product of likelihood Gaussians. In IEEE TENCON (pp. 1–6).

    Google Scholar 

  • Bharathi, B., Vijayalakshmi, P., & Nagarajan, T. (2011). Speaker identification using utterances correspond to speaker-specific-text. In IEEE students technology symposium (Techsym) (pp. 171–174).

    Chapter  Google Scholar 

  • Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. In Linguistic data consortium, Philadelphia, USA.

    Google Scholar 

  • Jankowski, C., et al. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. of ICASSP (pp. 109–112).

    Google Scholar 

  • Liu, C.-S., Lee, C.-H., Juang, B.-H., & Rosenberg, A. E. (1994). Speaker recognition based on minimum error discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 325–328).

    Google Scholar 

  • Nagarajan, T., & O’Shaughnessy, D. (2006). Discriminative MLE training using a product of Gaussian likelihoods. In INTERSPEECH—2006, Pittsburgh, Pennsylvania, USA (pp. 601–604).

    Google Scholar 

  • Nagarajan, T., & O’Shaughnessy, D. (2007). Bias estimation and correction in a classifier using product of likelihood-gaussians. In ICASSP, Hawaii, USA (pp. 1061–1064).

    Google Scholar 

  • Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83.

    Article  Google Scholar 

  • Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Bharathi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bharathi, B., Arun Kumar, C. & Nagarajan, T. Improving the performance of speaker and language identification tasks using unique characteristics of a class. Int J Speech Technol 16, 115–124 (2013). https://doi.org/10.1007/s10772-012-9167-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9167-z

Keywords

Navigation