Skip to main content

NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Abstract

Multilingual speaker verification introduces the challenge of verifying a speaker in multiple languages. Existing systems were built using i-vector/x-vector approaches along with Bi-LSTMs, which were trained to discriminate speakers, irrespective of the language. Instead of exploring the design space manually, we propose a neural architecture search for multilingual speaker verification suitable for mobile devices, called NeuralMultiling. First, our algorithm searches for an optimal operational combination of neural cells with different architectures for normal cells and reduction cells and then derives a CNN model by stacking neural cells. Using the derived architecture, we performed two different studies:1) language agnostic condition and 2) interoperability between languages and devices on the publicly available Multilingual Audio-Visual Smartphone (MAVS) dataset. The experimental results suggest that the derived architecture significantly outperforms the existing Autospeech method by a 5–6% reduction in the Equal Error Rate (EER) with fewer model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rattani, A., Derakhshani, R.: A survey of mobile face biometrics. Comput. Electr. Eng. 72, 39–52 (2018)

    Article  Google Scholar 

  2. Das, A., Galdi, C., Han, H., Ramachandra, R., Dugelay, J.L., Dantcheva, A.: Recent advances in biometric technology for mobile devices. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2018, pp. 1–11 (2018). https://doi.org/10.1109/BTAS.2018.8698587

  3. Ramachandra, R., et al.: Smartphone multi-modal biometric authentication: database and evaluation, arXiv preprint arXiv:1912.02487

  4. Mandalapu, H., Reddy, P.A., Ramachandra, R., Rao, K.S., Mitra, P., Prasanna, S.M., Busch, C.: Multilingual audio-visual smartphone dataset and evaluation. IEEE Access 9, 153240–153257 (2021)

    Article  Google Scholar 

  5. Research, Markets, Voice biometrics market forecast to 2028 - covid-19 impact and global analysis by component, type, authentication process, deployment, vertical, and application, May 2022. https://www.researchandmarkets.com/reports/5623597/voice-biometrics-market-forecast-to-2028-covid

  6. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  7. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE 2018, pp. 5329–5333 (2018)

    Google Scholar 

  8. Li, L., Wang, D., Rozi, A., Zheng, T.F.: Cross-lingual speaker verification with deep feature learning. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE 2017, pp. 1040–1044 (2017)

    Google Scholar 

  9. Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search, arXiv preprint arXiv:1806.09055

  10. Ding, S., Chen, T., Gong, X., Zha, W., Wang, Z.: Autospeech: neural architecture search for speaker recognition, arXiv preprint arXiv:2005.03215

  11. Ortega-Garcia, J., Gonzalez-Rodriguez, J., Marrero-Aguiar, V.: Ahumada: a large speech corpus in Spanish for speaker characterization and identification. Speech Commun. 31(2–3), 255–264 (2000)

    Article  Google Scholar 

  12. Greenberg, C.S., Mason, L.P., Sadjadi, S.O., Reynolds, D.A.: Two decades of speaker recognition evaluation at the national institute of standards and technology. Comput. Speech Lang. 60, 101032 (2020)

    Article  Google Scholar 

  13. Lu, L., Dong, Y., Zhao, X., Liu, J., Wang, H.: The effect of language factors for robust speaker recognition. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2009, pp. 4217–4220 (2009)

    Google Scholar 

  14. Matejka, P., Novotnỳ, O., Plchot, O., Burget, L., Sánchez, M.D., Cernockỳ, J.: Analysis of score normalization in multilingual speaker recognition. In: Interspeech, pp. 1567–1571 (2017)

    Google Scholar 

  15. Xia, W., Huang, J., Hansen, J.H.: Cross-lingual text-independent speaker verification using unsupervised adversarial discriminative domain adaptation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5816–5820 (2019)

    Google Scholar 

  16. Thienpondt, J., Desplanques, B., Demuynck, K.: Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization, arXiv preprint arXiv:2007.07689

  17. Chojnacka, R., Pelecanos, J., Wang, Q., Moreno, I.L.: Speakerstew: scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system, arXiv preprint arXiv:2104.02125

  18. Nam, K., Kim, Y., Huh, J., Heo, H.S., Jung, J.W., Chung, J.S.: Disentangled representation learning for multilingual speaker recognition, arXiv preprint arXiv:2211.00437

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. N. Aravinda Reddy .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 335 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aravinda Reddy, P.N., Ramachandra, R., Rao, K.S., Mitra, P. (2025). NeuralMultiling: A Novel Neural Architecture Search for Smartphone Based Multilingual Speaker Verification. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78341-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78340-1

  • Online ISBN: 978-3-031-78341-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics