Abstract
Multilingual speaker verification introduces the challenge of verifying a speaker in multiple languages. Existing systems were built using i-vector/x-vector approaches along with Bi-LSTMs, which were trained to discriminate speakers, irrespective of the language. Instead of exploring the design space manually, we propose a neural architecture search for multilingual speaker verification suitable for mobile devices, called NeuralMultiling. First, our algorithm searches for an optimal operational combination of neural cells with different architectures for normal cells and reduction cells and then derives a CNN model by stacking neural cells. Using the derived architecture, we performed two different studies:1) language agnostic condition and 2) interoperability between languages and devices on the publicly available Multilingual Audio-Visual Smartphone (MAVS) dataset. The experimental results suggest that the derived architecture significantly outperforms the existing Autospeech method by a 5–6% reduction in the Equal Error Rate (EER) with fewer model parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rattani, A., Derakhshani, R.: A survey of mobile face biometrics. Comput. Electr. Eng. 72, 39–52 (2018)
Das, A., Galdi, C., Han, H., Ramachandra, R., Dugelay, J.L., Dantcheva, A.: Recent advances in biometric technology for mobile devices. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2018, pp. 1–11 (2018). https://doi.org/10.1109/BTAS.2018.8698587
Ramachandra, R., et al.: Smartphone multi-modal biometric authentication: database and evaluation, arXiv preprint arXiv:1912.02487
Mandalapu, H., Reddy, P.A., Ramachandra, R., Rao, K.S., Mitra, P., Prasanna, S.M., Busch, C.: Multilingual audio-visual smartphone dataset and evaluation. IEEE Access 9, 153240–153257 (2021)
Research, Markets, Voice biometrics market forecast to 2028 - covid-19 impact and global analysis by component, type, authentication process, deployment, vertical, and application, May 2022. https://www.researchandmarkets.com/reports/5623597/voice-biometrics-market-forecast-to-2028-covid
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE 2018, pp. 5329–5333 (2018)
Li, L., Wang, D., Rozi, A., Zheng, T.F.: Cross-lingual speaker verification with deep feature learning. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE 2017, pp. 1040–1044 (2017)
Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search, arXiv preprint arXiv:1806.09055
Ding, S., Chen, T., Gong, X., Zha, W., Wang, Z.: Autospeech: neural architecture search for speaker recognition, arXiv preprint arXiv:2005.03215
Ortega-Garcia, J., Gonzalez-Rodriguez, J., Marrero-Aguiar, V.: Ahumada: a large speech corpus in Spanish for speaker characterization and identification. Speech Commun. 31(2–3), 255–264 (2000)
Greenberg, C.S., Mason, L.P., Sadjadi, S.O., Reynolds, D.A.: Two decades of speaker recognition evaluation at the national institute of standards and technology. Comput. Speech Lang. 60, 101032 (2020)
Lu, L., Dong, Y., Zhao, X., Liu, J., Wang, H.: The effect of language factors for robust speaker recognition. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2009, pp. 4217–4220 (2009)
Matejka, P., Novotnỳ, O., Plchot, O., Burget, L., Sánchez, M.D., Cernockỳ, J.: Analysis of score normalization in multilingual speaker recognition. In: Interspeech, pp. 1567–1571 (2017)
Xia, W., Huang, J., Hansen, J.H.: Cross-lingual text-independent speaker verification using unsupervised adversarial discriminative domain adaptation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5816–5820 (2019)
Thienpondt, J., Desplanques, B., Demuynck, K.: Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization, arXiv preprint arXiv:2007.07689
Chojnacka, R., Pelecanos, J., Wang, Q., Moreno, I.L.: Speakerstew: scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system, arXiv preprint arXiv:2104.02125
Nam, K., Kim, Y., Huh, J., Heo, H.S., Jung, J.W., Chung, J.S.: Disentangled representation learning for multilingual speaker recognition, arXiv preprint arXiv:2211.00437
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aravinda Reddy, P.N., Ramachandra, R., Rao, K.S., Mitra, P. (2025). NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-78341-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)