Ethio-Semitic language identification using convolutional neural networks with data augmentation

Alemu, Amlakie Aschale; Melese, Malefia Demilie; Salau, Ayodeji Olalekan

doi:10.1007/s11042-023-17094-y

Ethio-Semitic language identification using convolutional neural networks with data augmentation

Published: 26 September 2023

Volume 83, pages 34499–34514, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Amlakie Aschale Alemu¹,
Malefia Demilie Melese² &
Ayodeji Olalekan Salau ORCID: orcid.org/0000-0002-6264-9783^3,4

156 Accesses
2 Citations
Explore all metrics

Abstract

In today’s digital world, natural language is used to exchange information among humans, and it has now advanced to the point of being an evolution criteria for technology. The process of determining which language a speaker is speaking is known as spoken language identification, and it is used for front-end processing in human-computer interaction. In this study, we developed a Language Identification model for Ethio-Semitic languages because Language Identification is an intermediate task for other Natural Language Processing tasks such as speech to text translation, speech to speech translation, speech recognition, and speech information retrieval. We used Convolutional Neural Network with respect to different acoustic features such as Mel-frequency Cepstral Coefficients, mel-spectrogram and combined (Mel-frequency Cepstral Coefficients + mel-spectrogram) features to emphasize critical features for uncomplicated output identification. The study’s primary goal was to identify specific languages such as Amharic, Geez, Guragigna, and Tigrigna. Based on this, the results show that Convolutional Neural Network with augmented data and hybrid features performed better than using Mel-frequency Cepstral Coefficients or Mel-spectrogram features. The proposed model achieved an average performance accuracy of 97%, 97.4% and 99.5% for testing, validation, and training respectively. We consequently reached the conclusion that the combined (Mel- Spectrogram + Mel-frequency Cepstral Coefficients) feature was the most crucial feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Spoken Language Identification Using ConvNets

Deep Learning-Based Language Identification in Code-Mixed Text

Data availability

The datasets generated during and/or analysed during the current study are not publicly available but are available from the corresponding author on reasonable request.

Code availability

Not applicable.

Abbreviations

AMC:: Amhara Media Corporation
ANN:: Artificial Neural Network
ASR:: Automatic Speech Recognition
BLSTM:: Bidirectional Long Short-Term Memory
CNN:: Convolutional Neural Network
GFCC:: Gammatone Frequency Cepstral Coefficient
GMM:: Gaussian mixture model
Ethio-Semitic:: Ethiopian Semitic
LFCC:: Linear Frequency Cepstral Coefficients
LID:: Language Identification
MFCC:: Mel-frequency Cepstral Coefficients
NLP:: Natural Language Processing
OBN:: Oromia Broadcast Network
PCA:: Principal Component Analysis
SVM:: Support Vector Machine

References

Madhu C, George A, Mary L (2017) Automatic language identification for seven Indian languages using higher level features. 2017 IEEE Int. Conf. Signal Process. Informatics, Commun. Energy Syst. SPICES 2017. https://doi.org/10.1109/SPICES.2017.8091332
Discloser CR (2012) Language as a tool for communication and cultural reality discloser 1, pp 1–11. Available online: https://osf.io/preprints/inarxiv/nw94m/
Abate ST, Tachbelie MY, Schultz T (2020) Multilingual acoustic and language modeling for Ethio-Semitic languages multilingual acoustic and language modeling for Ethio-Semitic languages, no December. https://doi.org/10.21437/Interspeech.2020-2856
Tamiru NK, Tekeba M, Salau AO (2022) Recognition of Amharic sign language with Amharic alphabet signs using ANN and SVM. Visual Comput 38:1703–1718. https://doi.org/10.1007/s00371-021-02099-1
Article Google Scholar
Demilie WB, Salau AO (2022) Automated all in one misspelling detection and correction system for ethiopian languages. J Cloud Comput 11:48. https://doi.org/10.1186/s13677-022-00299-1
Article Google Scholar
Feleke TL (2021) Ethiosemitic languages: classifications and classification determinants. Ampersand 8:100074. https://doi.org/10.1016/j.amper.2021.100074
Article Google Scholar
Ragab MG, Abdulkadir SJ, Aziz N, Alhussian H, Bala A, Alqushaibi A (2021) An ensemble one dimensional convolutional neural network with Bayesian optimization for environmental sound classification. Appl Sci 11(10):4660. https://doi.org/10.3390/app11104660
Article Google Scholar
Khamees A, Hejazi H, Alshurideh MT, Salloum SA (2021) Classifying audio music genres using CNN and RNN CNN and RNN. No March. https://doi.org/10.1007/978-3-030-69717-4
Gris LR, Stefanel, Arnaldo Candido J (2020) Automatic spoken language identification using convolutional neural networks. Anais do XVII Congresso Latino-Americano de Software Livre e Tecnologias Abertas. SBC, pp 16–20
Salau AO, Olowoyo TD, Akinola SO (2020) Accent classification of the three major nigerian indigenous languages using 1D CNN LSTM network model. Algorithms for Intelligent Systems, Springer Singapore, pp 1–16. https://doi.org/10.1007/978-981-15-2620-6_1
Rao KS, Reddy VR, Maity S (2015) Language identification using spectral and prosodic features. Springer, Berlin
Book Google Scholar
Dey S, Sahidullah M, Saha G (2022) An overview of Indian spoken language recognition from machine learning perspective. ACM Trans Asian Low-Resour Lang Inf Process 21(6):1–45
Article Google Scholar
Singh G, Sharma S, Kumar V, Kaur M, Baz M, Masud M (2021) Spoken language identification using deep learning, vol 2021
Bartz C, Herold T, Yang H, Meinel C (2017) Language identification using deep convolutional recurrent neural networks. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part VI 24. Springer International Publishing, pp 880–889
Mushtaq Z, Su S, Tran Q (2021) Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appl Acoust 172:107581. https://doi.org/10.1016/j.apacoust.2020.107581
Article Google Scholar
Mukherjee S, Shivam N, Gangwal A, Khaitan L, Das AJ (2019) Spoken language recognition using CNN. IEEE 2019 International Conference on Information Technology (ICIT) - Bhubaneswar, India. https://doi.org/10.1109/ICIT48102.2019.00013
Lei Y, Ferrer L, Lawson A, Mclaren M, Scheffer N (2014) Application of convolutional neural networks to language identification in noisy conditions. Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, pp 287–292. Available online: https://www.isca-speech.org/archive/pdfs/odyssey_2014/lei14b_odyssey.pdf
Maity S, Vuppala AK, Sreenivasa Rao K, Nandi D (2012) IITKGP-MLILSC speech database for language identification. In 2012 National Conference on Communications (NCC). IEEE, pp 1–5
Abeje BT, Salau AO, Mengistu AD, Tamiru NK (2022) Ethiopian sign language recognition using deep convolutional neural network. Multimed Tools Appl 81:29027–29043. https://doi.org/10.1007/s11042-022-12768-5
Article Google Scholar
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio augmentation for speech recognition. In: Sixteenth annual conference of the international speech communication association, pp 1–4. Available online: https://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
Salamon J, Bello JP (2017) Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Processing Letters 24(3):279–283. https://doi.org/10.1109/LSP.2017.2657381
Article Google Scholar
Kshirsagar S, Falk TH (2022) Cross-language speech emotion recognition using bag-of-word representations, domain adaptation, and data augmentation. Sensors 22(17). https://doi.org/10.3390/s22176445
Zhou G, Chen Y, Chien C (2022) On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks. BMC Med Inform Decis Mak 22(1):226. https://doi.org/10.1186/s12911-022-01942-2
Article Google Scholar
Wang Q, Yu Y, Pelecanos J, Huang Y, Moreno IL (2022) Attentive temporal pooling for conformer-based streaming language identification in long-form speech, 255–262. https://doi.org/10.21437/odyssey.2022-36
Afrillia Y, Mawengkang H, Ramli M, Fhonna FRP (2017) Performance measurement of Mel frequency Ceptral Coefficient (MFCC) Method in learning system of Al- Qur’an based in Nagham Pattern Recognition. J Phys Conf Ser 930(2017):012036. https://doi.org/10.1088/1742-6596/930/1/012036
Article Google Scholar
Gurmessa DK, Salau AO, Gedefa A (2022) Afaan Oromo Language Fake News Detection in Social Media Using Convolutional Neural Network and Long Short Term Memory. J Electr Electron Eng 15(2):37–42
Google Scholar
Demilie WB, Salau AO, and Ravulakollu KK (2022) Evaluation of Part of Speech Tagger Approaches for the Amharic Language: A Review. 9th International Conference on Computing for Sustainable Global Development (INDIACom), pp 569–574. https://doi.org/10.23919/INDIACom54597.2022.9763213
Kríž V, Holub M, Pecina P (2015) Feature extraction for native language identification using language modeling. Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, vol. 2015-January, no. October, pp 298–306
Kim H, Park JS (2020) Automatic language identification using speech rhythm features for multi-lingual speech recognition. Appl Sci 10(7). https://doi.org/10.3390/app10072225
Hasan R, Hossain Z (2021) How many Mel-frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language, no. September, pp 817–827. https://doi.org/10.1049/tje2.12082
Hasan R, Hasan M (2021) Investigation of the Effect of MFCC Variation on the convolutional neural network-based Speech classification. No May. https://doi.org/10.1109/TENSYMP50017.2020.9230697
Article Google Scholar
Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. Conference: 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES), pp 5–10. https://doi.org/10.1109/ICPCES.2017.8117624
Petronas UT (2013) Shikha Gupta 1, Jafreezal Jaafar 2, Wan Fatimah wan Ahmad 3 and Arpit Bansal 4 Universiti Tecknologi PETRONAS, CIS Dept, Perak, Malaysia, vol 4, no 4, pp 101–108
Kumar A, Hemani H, Sakthivel N, Chaturvedi S (2015) Effective preprocessing of speech and acoustic features extraction for spoken language identification. Conference: 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp 81–88. https://doi.org/10.1109/ICSTM.2015.7225394
De Benito-gorron D, Lozano-diez A, Toledano DT, Gonzalez-rodriguez J (2019) Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. EURASIP Journal on Audio, Speech, and Music Processing, pp 1–18. https://doi.org/10.1186/s13636-019-0152-1
Chauhan N, Isshiki T, Li D (2020) Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM classifier for large input database speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database. 2019 IEEE 4th Int. Conf. Comput. Commun. Syst., no. December, pp 130–133. https://doi.org/10.1109/CCOMS.2019.8821751
Niu Y, Zou D, Niu Y, He Z, Tan H (2017) A breakthrough in speech emotion recognition using deep retinal convolution neural networks. arXiv preprint arXiv:1707.09917
Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289. https://doi.org/10.1016/j.apacoust.2020.107289
Article Google Scholar
Anjana JS, Poorna SS (2018) Language identification from speech features using SVM and LDA. Int Conf. Wirel. Commun. Signal Process. Networking, WiSPNET 2018, no. 1, pp 1–4. https://doi.org/10.1109/WiSPNET.2018.8538638
Fesseha A, Xiong S, Emiru ED, Diallo M, Dahou A (2021) Text classification based on convolutional neural networks and word embedding for low-resource languages: Tigrinya. Information 12(2):52. https://doi.org/10.3390/info12020052
Article Google Scholar
Furlan B, Batanović V, Nikolić B (2013) Semantic similarity of short texts in languages with a deficient natural language processing support. Decis Support Syst 55(3):710–719. https://doi.org/10.1016/j.dss.2013.02.002
Article Google Scholar
Batanović V, Cvetanović M, Nikolić B (2020) A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts. PLoS ONE 15:e0242050. https://doi.org/10.1371/journal.pone.0242050
Article Google Scholar

Download references

Funding

Authors declare no funding for this research.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Gafat Institute of Technology, Debre Tabor University, Debre Tabor, Ethiopia
Amlakie Aschale Alemu
Department of Information Technology, Gafat Institute of Technology, Debre Tabor University, Debre Tabor, Ethiopia
Malefia Demilie Melese
Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria
Ayodeji Olalekan Salau
Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamil Nadu, India
Ayodeji Olalekan Salau

Authors

Amlakie Aschale Alemu
View author publications
You can also search for this author in PubMed Google Scholar
Malefia Demilie Melese
View author publications
You can also search for this author in PubMed Google Scholar
Ayodeji Olalekan Salau
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Amlakie Aschale Alemu: Conceptualization, Methodology, Software, Visualization, Writing- Original draft preparation. Malefia Demilie Melese: Conceptualization, Software, Visualization, Investigation. Ayodeji Olalekan Salau: Data curation, Software, Methodology, Writing- Reviewing and Editing, Validation.

Corresponding author

Correspondence to Ayodeji Olalekan Salau.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alemu, A.A., Melese, M.D. & Salau, A.O. Ethio-Semitic language identification using convolutional neural networks with data augmentation. Multimed Tools Appl 83, 34499–34514 (2024). https://doi.org/10.1007/s11042-023-17094-y

Download citation

Received: 15 February 2023
Revised: 23 June 2023
Accepted: 15 September 2023
Published: 26 September 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-17094-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ethio-Semitic language identification using convolutional neural networks with data augmentation

Abstract

Access this article

Similar content being viewed by others

Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Spoken Language Identification Using ConvNets

Deep Learning-Based Language Identification in Code-Mixed Text

Data availability

Code availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ethio-Semitic language identification using convolutional neural networks with data augmentation

Abstract

Access this article

Similar content being viewed by others

Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Spoken Language Identification Using ConvNets

Deep Learning-Based Language Identification in Code-Mixed Text

Data availability

Code availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation