A nonlinear autoregressive model for speaker verification

Srinivasan, Sundararajan; Ma, Tao; Lazarou, Georgios; Picone, Joseph

doi:10.1007/s10772-013-9201-9

A nonlinear autoregressive model for speaker verification

Published: 06 June 2013

Volume 17, pages 17–25, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sundararajan Srinivasan¹,
Tao Ma²,
Georgios Lazarou³ &
…
Joseph Picone⁴

269 Accesses
Explore all metrics

Abstract

Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Text-Independent Speaker Identification and Verification with Gaussian Mixture Models

Robust Speaker Identification System Based on Variational Bayesian Inference Gaussian Mixture Model and Feature Normalization

Robust Speaker Identification Algorithms and Results in Noisy Environments

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ayadi, M. (2008). Autoregressive models for text independent speaker identification in noisy environments. Waterloo: University of Waterloo.
Google Scholar
Banbrook, M., Ushaw, G., & McLaughlin, S. (1997). How to extract Lyapunov exponents from short and noisy time series. IEEE Transactions on Signal Processing, 45(5), 1378–1382.
Article Google Scholar
Beigi, H. (2011). Fundamentals of speaker recognition (p. 942). Upper Saddle River: Springer.
Book MATH Google Scholar
Chen, C.-P., & Bilmes, J. A. (2007). MVA processing of speech features. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 257–270.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
MATH MathSciNet Google Scholar
Dennis, J., & Schnabel, R. (1996). Numerical methods for unconstrained optimization and nonlinear equations (p. 394). Englewood Cliffs: Prentice Hall.
Book MATH Google Scholar
Ephraim, Y., & Roberts, W. (2005). Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Processing Letters, 12(2), 166–169.
Article Google Scholar
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. The linguistic data consortium catalog. Philadelphia: The Linguistic Data Consortium.
Google Scholar
Greenberg, C. S., & Martin, A. F. (2009). NIST speaker recognition evaluations 1996–2008. In Proceedings of SPIE (Stereoscopic displays and applications XX), Orlando, FL, USA (p. 732411).
Google Scholar
Huang, K., & Picone, J. (2002). Internet-accessible speech recognition technology. In Proceedings of the IEEE midwest symposium on circuits and systems, Tulsa, OK, USA (pp. III-73–III-76).
Google Scholar
Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In IEEE international conference on acoustics speech and signal processing, Albuquerque, NM, USA (Vol. 1, pp. 109–112).
Google Scholar
Juang, B.-H., & Rabiner, L. (1985). Mixture autoregressive hidden Markov models for speech signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6), 1404–1413.
Article MathSciNet Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Kokkinos, I., & Maragos, P. (2005). Nonlinear speech analysis using models for chaotic systems. IEEE Transactions on Speech and Audio Processing, 13(6), 1098–1109.
Article Google Scholar
McLachlan, G. & Thriyambakam, K. (2008). The EM algorithm and extensions (p. 400). Hoboken: Wiley-Interscience.
Book MATH Google Scholar
May, D. (2008). Nonlinear dynamic invariants for continuous speech recognition. Starkville: Mississippi State University.
Google Scholar
Parihar, N., Picone, J., Pearce, D., & Hirsch, H.-G. (2004). Performance analysis of the Aurora large vocabulary baseline system. In Proceedings of the European signal processing conference, Vienna, Austria (pp. 553–556).
Google Scholar
Petry, A., Augusto, D., & Barone, C. (2002). Speaker identification using nonlinear dynamical features. Chaos, Solitons and Fractals, 13(2), 221–231.
Article MATH Google Scholar
Reynolds, D., & Campbell, W. (2008). Springer handbook of speech processing. Text-independent speaker recognition (1st ed., p. 1176). Berlin: Springer.
Google Scholar
Srinivasan, S., Ma, T., May, D., Lazarou, G., & Picone, J. (2008). Nonlinear mixture autoregressive hidden Markov models for speech recognition. In Proceedings of the international conference on spoken language processing, Brisbane, Australia (pp. 960–963).
Google Scholar
Zeevi, A., Meir, R., & Adler, R. Nonlinear models for time series using mixtures of autoregressive models (p. 25). Haifa, Israel. Retrieved from http://ie.technion.ac.il/~radler/mixar.pdf.
Wong, C. S., & Li, W. K. (2000). On a mixture autoregressive model. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 62(1), 95–115.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Nuance Communications Inc., 1198 East Arques Avenue, Sunnyvale, CA, 94085, USA
Sundararajan Srinivasan
Apple Inc., 2 Infinite Loop, mailstop 302-4APP, Cupertino, CA, 95014, USA
Tao Ma
The New York City Transit Authority, 30-74 38th Street, Apt 1A, Astoria, New York, NY, 11103, USA
Georgios Lazarou
Department of Electrical and Computer Engineering, Temple University, 1947 North 12th Street, Philadelphia, PA, 19027, USA
Joseph Picone

Authors

Sundararajan Srinivasan
View author publications
You can also search for this author inPubMed Google Scholar
Tao Ma
View author publications
You can also search for this author inPubMed Google Scholar
Georgios Lazarou
View author publications
You can also search for this author inPubMed Google Scholar
Joseph Picone
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Joseph Picone.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0414450. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasan, S., Ma, T., Lazarou, G. et al. A nonlinear autoregressive model for speaker verification. Int J Speech Technol 17, 17–25 (2014). https://doi.org/10.1007/s10772-013-9201-9

Download citation

Received: 12 February 2013
Accepted: 23 May 2013
Published: 06 June 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10772-013-9201-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A nonlinear autoregressive model for speaker verification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved Text-Independent Speaker Identification and Verification with Gaussian Mixture Models

Robust Speaker Identification System Based on Variational Bayesian Inference Gaussian Mixture Model and Feature Normalization

Robust Speaker Identification Algorithms and Results in Noisy Environments

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now