Abstract
In today’s digital world, natural language is used to exchange information among humans, and it has now advanced to the point of being an evolution criteria for technology. The process of determining which language a speaker is speaking is known as spoken language identification, and it is used for front-end processing in human-computer interaction. In this study, we developed a Language Identification model for Ethio-Semitic languages because Language Identification is an intermediate task for other Natural Language Processing tasks such as speech to text translation, speech to speech translation, speech recognition, and speech information retrieval. We used Convolutional Neural Network with respect to different acoustic features such as Mel-frequency Cepstral Coefficients, mel-spectrogram and combined (Mel-frequency Cepstral Coefficients + mel-spectrogram) features to emphasize critical features for uncomplicated output identification. The study’s primary goal was to identify specific languages such as Amharic, Geez, Guragigna, and Tigrigna. Based on this, the results show that Convolutional Neural Network with augmented data and hybrid features performed better than using Mel-frequency Cepstral Coefficients or Mel-spectrogram features. The proposed model achieved an average performance accuracy of 97%, 97.4% and 99.5% for testing, validation, and training respectively. We consequently reached the conclusion that the combined (Mel- Spectrogram + Mel-frequency Cepstral Coefficients) feature was the most crucial feature.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are not publicly available but are available from the corresponding author on reasonable request.
Code availability
Not applicable.
Abbreviations
- AMC:
-
Amhara Media Corporation
- ANN:
-
Artificial Neural Network
- ASR:
-
Automatic Speech Recognition
- BLSTM:
-
Bidirectional Long Short-Term Memory
- CNN:
-
Convolutional Neural Network
- GFCC:
-
Gammatone Frequency Cepstral Coefficient
- GMM:
-
Gaussian mixture model
- Ethio-Semitic:
-
Ethiopian Semitic
- LFCC:
-
Linear Frequency Cepstral Coefficients
- LID:
-
Language Identification
- MFCC:
-
Mel-frequency Cepstral Coefficients
- NLP:
-
Natural Language Processing
- OBN:
-
Oromia Broadcast Network
- PCA:
-
Principal Component Analysis
- SVM:
-
Support Vector Machine
References
Madhu C, George A, Mary L (2017) Automatic language identification for seven Indian languages using higher level features. 2017 IEEE Int. Conf. Signal Process. Informatics, Commun. Energy Syst. SPICES 2017. https://doi.org/10.1109/SPICES.2017.8091332
Discloser CR (2012) Language as a tool for communication and cultural reality discloser 1, pp 1–11. Available online: https://osf.io/preprints/inarxiv/nw94m/
Abate ST, Tachbelie MY, Schultz T (2020) Multilingual acoustic and language modeling for Ethio-Semitic languages multilingual acoustic and language modeling for Ethio-Semitic languages, no December. https://doi.org/10.21437/Interspeech.2020-2856
Tamiru NK, Tekeba M, Salau AO (2022) Recognition of Amharic sign language with Amharic alphabet signs using ANN and SVM. Visual Comput 38:1703–1718. https://doi.org/10.1007/s00371-021-02099-1
Demilie WB, Salau AO (2022) Automated all in one misspelling detection and correction system for ethiopian languages. J Cloud Comput 11:48. https://doi.org/10.1186/s13677-022-00299-1
Feleke TL (2021) Ethiosemitic languages: classifications and classification determinants. Ampersand 8:100074. https://doi.org/10.1016/j.amper.2021.100074
Ragab MG, Abdulkadir SJ, Aziz N, Alhussian H, Bala A, Alqushaibi A (2021) An ensemble one dimensional convolutional neural network with Bayesian optimization for environmental sound classification. Appl Sci 11(10):4660. https://doi.org/10.3390/app11104660
Khamees A, Hejazi H, Alshurideh MT, Salloum SA (2021) Classifying audio music genres using CNN and RNN CNN and RNN. No March. https://doi.org/10.1007/978-3-030-69717-4
Gris LR, Stefanel, Arnaldo Candido J (2020) Automatic spoken language identification using convolutional neural networks. Anais do XVII Congresso Latino-Americano de Software Livre e Tecnologias Abertas. SBC, pp 16–20
Salau AO, Olowoyo TD, Akinola SO (2020) Accent classification of the three major nigerian indigenous languages using 1D CNN LSTM network model. Algorithms for Intelligent Systems, Springer Singapore, pp 1–16. https://doi.org/10.1007/978-981-15-2620-6_1
Rao KS, Reddy VR, Maity S (2015) Language identification using spectral and prosodic features. Springer, Berlin
Dey S, Sahidullah M, Saha G (2022) An overview of Indian spoken language recognition from machine learning perspective. ACM Trans Asian Low-Resour Lang Inf Process 21(6):1–45
Singh G, Sharma S, Kumar V, Kaur M, Baz M, Masud M (2021) Spoken language identification using deep learning, vol 2021
Bartz C, Herold T, Yang H, Meinel C (2017) Language identification using deep convolutional recurrent neural networks. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part VI 24. Springer International Publishing, pp 880–889
Mushtaq Z, Su S, Tran Q (2021) Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appl Acoust 172:107581. https://doi.org/10.1016/j.apacoust.2020.107581
Mukherjee S, Shivam N, Gangwal A, Khaitan L, Das AJ (2019) Spoken language recognition using CNN. IEEE 2019 International Conference on Information Technology (ICIT) - Bhubaneswar, India. https://doi.org/10.1109/ICIT48102.2019.00013
Lei Y, Ferrer L, Lawson A, Mclaren M, Scheffer N (2014) Application of convolutional neural networks to language identification in noisy conditions. Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, pp 287–292. Available online: https://www.isca-speech.org/archive/pdfs/odyssey_2014/lei14b_odyssey.pdf
Maity S, Vuppala AK, Sreenivasa Rao K, Nandi D (2012) IITKGP-MLILSC speech database for language identification. In 2012 National Conference on Communications (NCC). IEEE, pp 1–5
Abeje BT, Salau AO, Mengistu AD, Tamiru NK (2022) Ethiopian sign language recognition using deep convolutional neural network. Multimed Tools Appl 81:29027–29043. https://doi.org/10.1007/s11042-022-12768-5
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio augmentation for speech recognition. In: Sixteenth annual conference of the international speech communication association, pp 1–4. Available online: https://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
Salamon J, Bello JP (2017) Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Processing Letters 24(3):279–283. https://doi.org/10.1109/LSP.2017.2657381
Kshirsagar S, Falk TH (2022) Cross-language speech emotion recognition using bag-of-word representations, domain adaptation, and data augmentation. Sensors 22(17). https://doi.org/10.3390/s22176445
Zhou G, Chen Y, Chien C (2022) On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks. BMC Med Inform Decis Mak 22(1):226. https://doi.org/10.1186/s12911-022-01942-2
Wang Q, Yu Y, Pelecanos J, Huang Y, Moreno IL (2022) Attentive temporal pooling for conformer-based streaming language identification in long-form speech, 255–262. https://doi.org/10.21437/odyssey.2022-36
Afrillia Y, Mawengkang H, Ramli M, Fhonna FRP (2017) Performance measurement of Mel frequency Ceptral Coefficient (MFCC) Method in learning system of Al- Qur’an based in Nagham Pattern Recognition. J Phys Conf Ser 930(2017):012036. https://doi.org/10.1088/1742-6596/930/1/012036
Gurmessa DK, Salau AO, Gedefa A (2022) Afaan Oromo Language Fake News Detection in Social Media Using Convolutional Neural Network and Long Short Term Memory. J Electr Electron Eng 15(2):37–42
Demilie WB, Salau AO, and Ravulakollu KK (2022) Evaluation of Part of Speech Tagger Approaches for the Amharic Language: A Review. 9th International Conference on Computing for Sustainable Global Development (INDIACom), pp 569–574. https://doi.org/10.23919/INDIACom54597.2022.9763213
Kríž V, Holub M, Pecina P (2015) Feature extraction for native language identification using language modeling. Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, vol. 2015-January, no. October, pp 298–306
Kim H, Park JS (2020) Automatic language identification using speech rhythm features for multi-lingual speech recognition. Appl Sci 10(7). https://doi.org/10.3390/app10072225
Hasan R, Hossain Z (2021) How many Mel-frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language, no. September, pp 817–827. https://doi.org/10.1049/tje2.12082
Hasan R, Hasan M (2021) Investigation of the Effect of MFCC Variation on the convolutional neural network-based Speech classification. No May. https://doi.org/10.1109/TENSYMP50017.2020.9230697
Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. Conference: 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES), pp 5–10. https://doi.org/10.1109/ICPCES.2017.8117624
Petronas UT (2013) Shikha Gupta 1, Jafreezal Jaafar 2, Wan Fatimah wan Ahmad 3 and Arpit Bansal 4 Universiti Tecknologi PETRONAS, CIS Dept, Perak, Malaysia, vol 4, no 4, pp 101–108
Kumar A, Hemani H, Sakthivel N, Chaturvedi S (2015) Effective preprocessing of speech and acoustic features extraction for spoken language identification. Conference: 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp 81–88. https://doi.org/10.1109/ICSTM.2015.7225394
De Benito-gorron D, Lozano-diez A, Toledano DT, Gonzalez-rodriguez J (2019) Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. EURASIP Journal on Audio, Speech, and Music Processing, pp 1–18. https://doi.org/10.1186/s13636-019-0152-1
Chauhan N, Isshiki T, Li D (2020) Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM classifier for large input database speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database. 2019 IEEE 4th Int. Conf. Comput. Commun. Syst., no. December, pp 130–133. https://doi.org/10.1109/CCOMS.2019.8821751
Niu Y, Zou D, Niu Y, He Z, Tan H (2017) A breakthrough in speech emotion recognition using deep retinal convolution neural networks. arXiv preprint arXiv:1707.09917
Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289. https://doi.org/10.1016/j.apacoust.2020.107289
Anjana JS, Poorna SS (2018) Language identification from speech features using SVM and LDA. Int Conf. Wirel. Commun. Signal Process. Networking, WiSPNET 2018, no. 1, pp 1–4. https://doi.org/10.1109/WiSPNET.2018.8538638
Fesseha A, Xiong S, Emiru ED, Diallo M, Dahou A (2021) Text classification based on convolutional neural networks and word embedding for low-resource languages: Tigrinya. Information 12(2):52. https://doi.org/10.3390/info12020052
Furlan B, Batanović V, Nikolić B (2013) Semantic similarity of short texts in languages with a deficient natural language processing support. Decis Support Syst 55(3):710–719. https://doi.org/10.1016/j.dss.2013.02.002
Batanović V, Cvetanović M, Nikolić B (2020) A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts. PLoS ONE 15:e0242050. https://doi.org/10.1371/journal.pone.0242050
Funding
Authors declare no funding for this research.
Author information
Authors and Affiliations
Contributions
Amlakie Aschale Alemu: Conceptualization, Methodology, Software, Visualization, Writing- Original draft preparation. Malefia Demilie Melese: Conceptualization, Software, Visualization, Investigation. Ayodeji Olalekan Salau: Data curation, Software, Methodology, Writing- Reviewing and Editing, Validation.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alemu, A.A., Melese, M.D. & Salau, A.O. Ethio-Semitic language identification using convolutional neural networks with data augmentation. Multimed Tools Appl 83, 34499–34514 (2024). https://doi.org/10.1007/s11042-023-17094-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17094-y