Skip to main content
Log in

A nonlinear autoregressive model for speaker verification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ayadi, M. (2008). Autoregressive models for text independent speaker identification in noisy environments. Waterloo: University of Waterloo.

    Google Scholar 

  • Banbrook, M., Ushaw, G., & McLaughlin, S. (1997). How to extract Lyapunov exponents from short and noisy time series. IEEE Transactions on Signal Processing, 45(5), 1378–1382.

    Article  Google Scholar 

  • Beigi, H. (2011). Fundamentals of speaker recognition (p. 942). Upper Saddle River: Springer.

    Book  MATH  Google Scholar 

  • Chen, C.-P., & Bilmes, J. A. (2007). MVA processing of speech features. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 257–270.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  • Dennis, J., & Schnabel, R. (1996). Numerical methods for unconstrained optimization and nonlinear equations (p. 394). Englewood Cliffs: Prentice Hall.

    Book  MATH  Google Scholar 

  • Ephraim, Y., & Roberts, W. (2005). Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 12(2), 166–169.

    Article  Google Scholar 

  • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. The linguistic data consortium catalog. Philadelphia: The Linguistic Data Consortium.

    Google Scholar 

  • Greenberg, C. S., & Martin, A. F. (2009). NIST speaker recognition evaluations 1996–2008. In Proceedings of SPIE (Stereoscopic displays and applications XX), Orlando, FL, USA (p. 732411).

    Google Scholar 

  • Huang, K., & Picone, J. (2002). Internet-accessible speech recognition technology. In Proceedings of the IEEE midwest symposium on circuits and systems, Tulsa, OK, USA (pp. III-73–III-76).

    Google Scholar 

  • Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In IEEE international conference on acoustics speech and signal processing, Albuquerque, NM, USA (Vol. 1, pp. 109–112).

    Google Scholar 

  • Juang, B.-H., & Rabiner, L. (1985). Mixture autoregressive hidden Markov models for speech signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6), 1404–1413.

    Article  MathSciNet  Google Scholar 

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  • Kokkinos, I., & Maragos, P. (2005). Nonlinear speech analysis using models for chaotic systems. IEEE Transactions on Speech and Audio Processing, 13(6), 1098–1109.

    Article  Google Scholar 

  • McLachlan, G. & Thriyambakam, K. (2008). The EM algorithm and extensions (p. 400). Hoboken: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • May, D. (2008). Nonlinear dynamic invariants for continuous speech recognition. Starkville: Mississippi State University.

    Google Scholar 

  • Parihar, N., Picone, J., Pearce, D., & Hirsch, H.-G. (2004). Performance analysis of the Aurora large vocabulary baseline system. In Proceedings of the European signal processing conference, Vienna, Austria (pp. 553–556).

    Google Scholar 

  • Petry, A., Augusto, D., & Barone, C. (2002). Speaker identification using nonlinear dynamical features. Chaos, Solitons and Fractals, 13(2), 221–231.

    Article  MATH  Google Scholar 

  • Reynolds, D., & Campbell, W. (2008). Springer handbook of speech processing. Text-independent speaker recognition (1st ed., p. 1176). Berlin: Springer.

    Google Scholar 

  • Srinivasan, S., Ma, T., May, D., Lazarou, G., & Picone, J. (2008). Nonlinear mixture autoregressive hidden Markov models for speech recognition. In Proceedings of the international conference on spoken language processing, Brisbane, Australia (pp. 960–963).

    Google Scholar 

  • Zeevi, A., Meir, R., & Adler, R. Nonlinear models for time series using mixtures of autoregressive models (p. 25). Haifa, Israel. Retrieved from http://ie.technion.ac.il/~radler/mixar.pdf.

  • Wong, C. S., & Li, W. K. (2000). On a mixture autoregressive model. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 62(1), 95–115.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Picone.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0414450. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasan, S., Ma, T., Lazarou, G. et al. A nonlinear autoregressive model for speaker verification. Int J Speech Technol 17, 17–25 (2014). https://doi.org/10.1007/s10772-013-9201-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9201-9

Keywords

Navigation