On the Use of Convolutional Neural Networks in Pairwise Language Recognition

Lozano-Diez, Alicia; Gonzalez-Dominguez, Javier; Zazo, Ruben; Ramos, Daniel; Gonzalez-Rodriguez, Joaquin

doi:10.1007/978-3-319-13623-3_9

Alicia Lozano-Diez²³,
Javier Gonzalez-Dominguez²³,
Ruben Zazo²³,
Daniel Ramos²³ &
…
Joaquin Gonzalez-Rodriguez²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

847 Accesses
3 Citations

Abstract

Convolutional deep neural networks (CDNNs) have been successfully applied to different tasks within the machine learning field, and, in particular, to speech, speaker and language recognition. In this work, we have applied them to pair-wise language recognition tasks. The proposed systems have been evaluated on challenging pairs of languages from NIST LRE’09 dataset. Results have been compared with two spectral systems based on Factor Analysis and Total Variability (i-vector) strategies, respectively. Moreover, a simple fusion of the developed approaches and the reference systems has been performed. Some individual and fusion systems outperform the reference systems, obtaining ~ 17% of relative improvement in terms of minC _DET for one of the challenging pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009), also published as a book. Now Publishers (2009)
Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (June 2010), oral Presentation
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., Castaldo, F.: Support vector machines and joint factor analysis for speaker verification. In: ICASSP, pp. 4237–4240 (2009)
Google Scholar
Ghahabi, O., Hernando, J.: i-vector modeling with deep belief networks for multi-session speaker recognition. In: Proc. ODYSSEY (2014)
Google Scholar
Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Atvs-uam nist sre 2010 system. In: Proceedings of FALA 2010 (November 2010)
Google Scholar
Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Multilevel and session variability compensated language recognition: Atvs-uam systems at nist lre 2009. IEEE Journal on Selected Topics in Signal Processing (2010) (article in press)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine (2012)
Google Scholar
Jaitly, N., Nguyen, P., Senior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech 2012 (2012)
Google Scholar
Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing 13(3), 345–354 (2005)
Article Google Scholar
Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep neural networks for extracting baum-welch statistics for speaker recognition. In: Proc. ODYSSEY (2014)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Intelligent Signal Processing, pp. 306–351. IEEE Press (2001)
Google Scholar
Lee, H., Largman, Y., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems 22, pp. 1096–1104 (2009)
Google Scholar
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N.: Application of convolutional neural networks to language identification in noisy conditions. In: Proc. ODYSSEY (2014)
Google Scholar
LISA: Deep Learning Tutorial. University of Montreal, http://deeplearning.net/tutorial/
Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O.: Automatic language identification using deep neural networks. In: Proc. ICASSP (2014)
Google Scholar
Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. on Audio, Speech and Language Processing, http://www.cs.toronto.edu/~hinton/absps/speechDBN_jrnl.pdf
NIST: The 2009 nist language recognition evaluation plan (2009), http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09_EvalPlan_v6.pdf
Penagarikano, M., Varona, A., Diez, M., Rodriguez-Fuentes, L.J., Bordel, G.: Study of different backends in a state-of-the-art language recognition system. In: INTERSPEECH (2012)
Google Scholar
Van Leeuwen, D.A., Brummer, N.: Channel-dependent gmm and multi-class logistic regression models for language recognition. In: IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, pp. 1–8. IEEE (2006)
Google Scholar
Vogt, R., Sridharan, S.: Explicit modelling of session variability for speaker verification. Computer Speech & Language 22(1), 17–38 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ATVS - Biometric Recognition Group, Universidad Autonoma de Madrid (UAM), Spain
Alicia Lozano-Diez, Javier Gonzalez-Dominguez, Ruben Zazo, Daniel Ramos & Joaquin Gonzalez-Rodriguez

Authors

Alicia Lozano-Diez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Gonzalez-Dominguez
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Zazo
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Gonzalez-Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETSIT, Las Palmas de Gran Canaria, Spain
Juan Luis Navarro Mesa , Eduardo Hernández Pérez , Pedro Quintana Morales , Antonio Ravelo García & Iván Guerra Moreno , , , &
University of Zaragoza, Spain
Alfonso Ortega
Dep. of Electronics, Telecommunications and Informatics Engineering, University of Aveiro, Portugal
António Teixeira
ATVS Biometric Recognition Group,, Universidad Autónoma de Madrid, Spain
Doroteo T. Toledano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lozano-Diez, A., Gonzalez-Dominguez, J., Zazo, R., Ramos, D., Gonzalez-Rodriguez, J. (2014). On the Use of Convolutional Neural Networks in Pairwise Language Recognition. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-13623-3_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics