NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification

Aravinda Reddy, P. N.; Ramachandra, Raghavendra; Rao, K. Sreenivasa; Mitra, Pabitra

doi:10.1007/978-3-031-78341-8_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15314))

Included in the following conference series:

International Conference on Pattern Recognition

173 Accesses

Abstract

Multilingual speaker verification introduces the challenge of verifying a speaker in multiple languages. Existing systems were built using i-vector/x-vector approaches along with Bi-LSTMs, which were trained to discriminate speakers, irrespective of the language. Instead of exploring the design space manually, we propose a neural architecture search for multilingual speaker verification suitable for mobile devices, called NeuralMultiling. First, our algorithm searches for an optimal operational combination of neural cells with different architectures for normal cells and reduction cells and then derives a CNN model by stacking neural cells. Using the derived architecture, we performed two different studies:1) language agnostic condition and 2) interoperability between languages and devices on the publicly available Multilingual Audio-Visual Smartphone (MAVS) dataset. The experimental results suggest that the derived architecture significantly outperforms the existing Autospeech method by a 5–6% reduction in the Equal Error Rate (EER) with fewer model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Effect of Language Mixture on Speaker Verification: An Investigation with Amharic, English, and Mandarin Chinese

I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

Cross-lingual Speaker Verification: Evaluation on X-Vector Method

References

Rattani, A., Derakhshani, R.: A survey of mobile face biometrics. Comput. Electr. Eng. 72, 39–52 (2018)
Article Google Scholar
Das, A., Galdi, C., Han, H., Ramachandra, R., Dugelay, J.L., Dantcheva, A.: Recent advances in biometric technology for mobile devices. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2018, pp. 1–11 (2018). https://doi.org/10.1109/BTAS.2018.8698587
Ramachandra, R., et al.: Smartphone multi-modal biometric authentication: database and evaluation, arXiv preprint arXiv:1912.02487
Mandalapu, H., Reddy, P.A., Ramachandra, R., Rao, K.S., Mitra, P., Prasanna, S.M., Busch, C.: Multilingual audio-visual smartphone dataset and evaluation. IEEE Access 9, 153240–153257 (2021)
Article Google Scholar
Research, Markets, Voice biometrics market forecast to 2028 - covid-19 impact and global analysis by component, type, authentication process, deployment, vertical, and application, May 2022. https://www.researchandmarkets.com/reports/5623597/voice-biometrics-market-forecast-to-2028-covid
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Article Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE 2018, pp. 5329–5333 (2018)
Google Scholar
Li, L., Wang, D., Rozi, A., Zheng, T.F.: Cross-lingual speaker verification with deep feature learning. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE 2017, pp. 1040–1044 (2017)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search, arXiv preprint arXiv:1806.09055
Ding, S., Chen, T., Gong, X., Zha, W., Wang, Z.: Autospeech: neural architecture search for speaker recognition, arXiv preprint arXiv:2005.03215
Ortega-Garcia, J., Gonzalez-Rodriguez, J., Marrero-Aguiar, V.: Ahumada: a large speech corpus in Spanish for speaker characterization and identification. Speech Commun. 31(2–3), 255–264 (2000)
Article Google Scholar
Greenberg, C.S., Mason, L.P., Sadjadi, S.O., Reynolds, D.A.: Two decades of speaker recognition evaluation at the national institute of standards and technology. Comput. Speech Lang. 60, 101032 (2020)
Article Google Scholar
Lu, L., Dong, Y., Zhao, X., Liu, J., Wang, H.: The effect of language factors for robust speaker recognition. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2009, pp. 4217–4220 (2009)
Google Scholar
Matejka, P., Novotnỳ, O., Plchot, O., Burget, L., Sánchez, M.D., Cernockỳ, J.: Analysis of score normalization in multilingual speaker recognition. In: Interspeech, pp. 1567–1571 (2017)
Google Scholar
Xia, W., Huang, J., Hansen, J.H.: Cross-lingual text-independent speaker verification using unsupervised adversarial discriminative domain adaptation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5816–5820 (2019)
Google Scholar
Thienpondt, J., Desplanques, B., Demuynck, K.: Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization, arXiv preprint arXiv:2007.07689
Chojnacka, R., Pelecanos, J., Wang, Q., Moreno, I.L.: Speakerstew: scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system, arXiv preprint arXiv:2104.02125
Nam, K., Kim, Y., Huh, J., Heo, H.S., Jung, J.W., Chung, J.S.: Disentangled representation learning for multilingual speaker recognition, arXiv preprint arXiv:2211.00437

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Kharagpur, Kharagpur, India
P. N. Aravinda Reddy, K. Sreenivasa Rao & Pabitra Mitra
Norwegian University of Science and Technology (NTNU), Gjøvik, Norway
Raghavendra Ramachandra

Authors

P. N. Aravinda Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Raghavendra Ramachandra
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. N. Aravinda Reddy .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 335 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aravinda Reddy, P.N., Ramachandra, R., Rao, K.S., Mitra, P. (2025). NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-78341-8_26
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification