Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language

Monteiro, Sean; Angra, Ananya; H., Muralikrishna; Thenkanidiyoor, Veena; Dileep, A. D.

doi:10.1007/978-3-031-48312-7_37

Sean Monteiro¹³,
Ananya Angra¹⁴,
Muralikrishna H.¹⁵,
Veena Thenkanidiyoor¹³ &
…
A. D. Dileep¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14339))

Included in the following conference series:

International Conference on Speech and Computer

604 Accesses

Abstract

This work aims to identify dialects for Konkani language. In this work, various state-of-the-art methods in language identification are explored for the identification of dialects of the Konkani language. The initial base model is constructed using fully connected neural network which is trained on frame-level Mel-frequency cepstral coefficient (MFCC) features. This base model trained on frame-level features is then used for comparison with state-of-the-art models from language identification task that are built for dialect identification (DID) that use utterance-level embeddings, namely x-vector and u-vector. The x-vector and u-vector based models are trained on segment-level features. This work explores segment-level features namely phone-state bottleneck features (BNFs) and wav2vec features extracted from pretrained feature extractors. The x-vector based model uses time delay neural network (TDNN) for the extraction of an utterance-level embedding from sequence of speech segments. A u-vector based model uses bidirectional LSTM (BLSTM) to extract utterance-level embeddings from sequence of speech segments. This work also proposes a novel transformer-based model to extract utterance-level embedding from sequence of speech segments. Results show the effectiveness of the proposed methods for DID of Konkani. It is observed that proposed transformer-based model outperform the other explored models. The results also show the superiority of wav2vec features over the phone-state BNFs for DID task.

S. Monteiro and A. Angra—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Neural Networks for Spoken Language Identification in Short Utterances

Kannada Dialect Classification Using CNN

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

References

Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems 33, pp. 12449–12460 (2020)
Google Scholar
Chambers, J.K., Trudgill, P.: Dialectology. Cambridge Textbooks in Linguistics, Cambridge University Press, 2 edn. (1998). https://doi.org/10.1017/CBO9780511805103
Chittaragi, N.B., Koolagudi, S.G.: Acoustic-phonetic feature based Kannada dialect identification from vowel sounds. Int. J. Speech Technol. 22, 1099–1113 (2019)
Article Google Scholar
Chittaragi, N.B., Koolagudi, S.G.: Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms. Lang. Resour. Eval. 54, 553–585 (2020)
Article Google Scholar
Chittaragi, N.B., Koolagudi, S.G.: Dialect identification using chroma-spectral shape features with ensemble technique. Comput. Speech Lang. 70, 101230 (2021)
Article Google Scholar
Chittaragi, N.B., Prakash, A., Koolagudi, S.G.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43, 4289–4302 (2018)
Article Google Scholar
Fer, R., Matějka, P., Grézl, F., Plchot, O., Veselỳ, K., Černockỳ, J.H.: Multilingually trained bottleneck features in spoken language recognition. Comput. Speech Lang. 46, 252–267 (2017)
Article Google Scholar
Ferragne, E., Pellegrino, F.: Automatic dialect identification: a study of British English, pp. 243–257 (2007). https://doi.org/10.1007/978-3-540-74122-0_19
Mothukuri, S.K.P., Hegde, P., Chittaragi, N.B., Koolagudi, S.G.: Kannada dialect classification using artificial neural networks. In: 2020 International Conference on Artificial Intelligence and Signal Processing (AISP), pp. 1–5. IEEE (2020)
Google Scholar
Muralikrishna, H., Gupta, S., Dileep, A.D., Rajan, P.: Noise-robust spoken language identification using language relevance factor based embedding. In: 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 644–651. IEEE (2021)
Google Scholar
Muralikrishna, H., Kapoor, S., Dileep, A.D., Rajan, P.: Spoken language identification in unseen target domain using within-sample similarity loss. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7223–7227. IEEE (2021)
Google Scholar
Ramamoorthy, L., Choudhary, N., Varik, S., Tanawade, R.S.: Konkani Raw Speech Corpus. Central Institute of Indian Languages, Mysore (2019)
Google Scholar
Rao, K.S., Koolagudi, S.G.: Identification of Hindi dialects and emotions using spectral and prosodic features of speech. IJSCI: Int. J. System. Cybern. Inf. 9(4), 24–33 (2011)
Google Scholar
Schneider, S., Baevski, A., Collobert, R., Auli, M.: wav2vec: unsupervised pre-training for speech recognition, pp. 3465–3469 (2019). https://doi.org/10.21437/Interspeech.2019-1873
Sharma, M.: Multi-lingual multi-task speech emotion recognition using wav2vec 2.0. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6907–6911. IEEE (2022)
Google Scholar
Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., Khudanpur, S.: Spoken language recognition using x-vectors. In: Odyssey, vol. 2018, pp. 105–111 (2018)
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: x-vectors: Robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)
Google Scholar
Soorajkumar, R., Girish, G.N., Ramteke, P.B., Joshi, S.S., Koolagudi, S.G.: Text-independent automatic accent identification system for Kannada language. In: Satapathy, S.C., Bhateja, V., Joshi, A. (eds.) Proceedings of the International Conference on Data Engineering and Communication Technology. AISC, vol. 469, pp. 411–418. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-1678-3_40
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 2–7 (2017)
Google Scholar

Download references

Acknowledgement

This work resulted from research supported by Ministry of Electronics & Information Technology (MeitY), Government of India through project titled “National Language Translation Mission (NLTM) : BHASHINI”.

Author information

Authors and Affiliations

Department of CSE, National Institute of Technology Goa, Ponda, Goa, India
Sean Monteiro & Veena Thenkanidiyoor
MANAS Lab, SCEE, Indian Institute of Technology Mandi, Mandi, India
Ananya Angra & A. D. Dileep
Department of ECE, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
Muralikrishna H.

Authors

Sean Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Ananya Angra
View author publications
You can also search for this author in PubMed Google Scholar
Muralikrishna H.
View author publications
You can also search for this author in PubMed Google Scholar
Veena Thenkanidiyoor
View author publications
You can also search for this author in PubMed Google Scholar
A. D. Dileep
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. D. Dileep .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monteiro, S., Angra, A., H., M., Thenkanidiyoor, V., Dileep, A.D. (2023). Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14339. Springer, Cham. https://doi.org/10.1007/978-3-031-48312-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-48312-7_37
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48311-0
Online ISBN: 978-3-031-48312-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring the Impact of Different Approaches for Spoken Dialect Identification of Konkani Language