Development of novel automated language classification model using pyramid pattern technique with speech signals

Akbal, Erhan; Barua, Prabal Datta; Tuncer, Turker; Dogan, Sengul; Acharya, U. Rajendra

doi:10.1007/s00521-022-07613-7

Development of novel automated language classification model using pyramid pattern technique with speech signals

Original Article
Published: 25 July 2022

Volume 34, pages 21319–21333, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Erhan Akbal¹,
Prabal Datta Barua^2,3,
Turker Tuncer¹,
Sengul Dogan¹ &
…
U. Rajendra Acharya ORCID: orcid.org/0000-0003-2689-8552^4,5,6

449 Accesses
4 Citations
Explore all metrics

Abstract

Language classification using speeches is a complex issue in machine learning and pattern recognition. Various text and image-based language classification methods have been presented. But there are limited speech-based language classification methods in the literature. Also, the previously presented models classified limited numbers of languages, and few are accents. This work presents an automated handcrafted language classification model. The novel pyramid pattern is presented to extract the features extraction. Also, statistical features and maximum pooling are used to generate the features. We have developed our speech-language classification model using two datasets: (i) created a new big speech dataset containing 14,500 speeches in 29 languages, and (ii) used the VoxForge dataset. The neighborhood component analysis method is used to select the most informative 1000 features from the generated features, and these features are classified using a quadratic support vector machine classifier (QSVM). Our developed method yielded 98.87 ± 0.30% and 97.12 ± 1.27% accuracies for our and VoxForge datasets, respectively. Also, geometric mean, average precision, and F1-score evaluation parameters are calculated, and they are presented in the results section. This paper presents an accurate language classification model developed using two big speech-language datasets. Our results indicate the success of the proposed pyramid pattern-based language classification method in classifying various speech languages accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIFA: Language identification from audio with LPCC-G features

Article 14 December 2023

Development of accurate automated language identification model using polymer pattern and tent maximum absolute pooling techniques

Article 20 January 2022

A Novel Approach for Spoken Language Identification and Performance Comparison Using Machine Learning-Based Classifiers and Neural Network

References

Demuro E, Gurney L (2021) Languages/languaging as world-making: the ontological bases of language. Lang Sci 83:101307
Article Google Scholar
Das RK, Prasanna SM (2018) Speaker verification from short utterance perspective: a review. IETE Tech Rev 35(6):599–617
Article Google Scholar
Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289
Article Google Scholar
Krčadinac O, Šošević U, Starčević D (2021) Evaluating the performance of speaker recognition solutions in E-Commerce applications. Sensors 21(18):6231
Article Google Scholar
Ambikairajah E, Li H, Wang L, Yin B, Sethu V (2011) Language identification: a tutorial. IEEE Circuits Syst Mag 11(2):82–108
Article Google Scholar
Muthusamy YK, Barnard E, Cole RA (1994) Reviewing automatic language identification. IEEE Signal Process Mag 11(4):33–41
Article Google Scholar
Besacier L, Barnard E, Karpov A, Schultz T (2014) Automatic speech recognition for under resourced languages: a survey. Speech Commun 56:85–100
Article Google Scholar
Singh G, Sharma S, Kumar V, Kaur M, Baz M, Masud M (2021) Spoken language identification using deep learning. Comput Intell Neurosci 2021:5123671. https://doi.org/10.1155/2021/5123671
Article Google Scholar
Stutzman K (2007) The effects of digital audio files and online discussions on student proficiency in a foreign language. Iowa State University, Iowa
Wahlster W (2013) Verbmobil: foundations of speech-to-speech translation. Springer, Cham
Waibel A, Jain AN, McNair AE, Saito H, Hauptmann AG, Tebelskis J (1991) JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies. In: Acoustics, speech, and signal processing, IEEE international conference on, 1991. IEEE Computer Society, pp 793–796
Nakamura S, Markov K, Nakaiwa H, Kikui G-i, Kawai H, Jitsuhiro T, Zhang J-S, Yamamoto H, Sumita E, Yamamoto S (2006) The ATR multilingual speech-to-speech translation system. IEEE Trans Audio Speech Lang Process 14(2):365–376
Article Google Scholar
Basu J, Majumder S (2020) Identification of seven low-resource North-Eastern languages: an experimental study. In: Intelligence Enabled Research. Springer, Cham, pp 71–81
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2010) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Fer R, Matějka P, Grézl F, Plchot O, Veselý K, Černocký JH (2017) Multilingually trained bottleneck features in spoken language recognition. Comput Speech Lang 46:252–267
Article Google Scholar
Liu G, Sadjadi SO, Hasan T, Suh J-W, Zhang C, Mehrabani M, Boril H, Sangwan A, Hansen JH (2011) UTD-CRSS systems for NIST language recognition evaluation 2011. In: NIST 2011 Language recognition evaluation workshop, Atlanta, USA, pp 6–7
Singer E, Torres-Carrasquillo P, Reynolds DA, McCree A, Richardson F, Dehak N, Sturim D (2012) The MITLL NIST LRE 2011 language recognition system. In: Odyssey 2012-the speaker and language recognition workshop, 2012
Zhang Q, Liu G, Hansen JH (2014) Robust language recognition based on diverse features. In: ODYSSEY: The speaker and language and language recognition workshop, pp 152–157
Dustor A, Szwarc P (2010) Spoken language identification based on GMM models. In: ICSES 2010 international conference on signals and electronic circuits, 2010. IEEE, pp 105–108
Bharali SS, Kalita SK (2015) A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. Int J Speech Technol 18(4):673–684
Article Google Scholar
Gelly G, Gauvain J-L, Le VB, Messaoudi A (2016) A divide-and-conquer approach for language identification based on recurrent neural networks. In: INTERSPEECH, 2016. pp 3231–3235
Bhatia M, Singh N, Singh A (2015) Speaker accent recognition by MFCC Using KNearest neighbour algorithm: a different approach. Int J Adv Res Comput Commun Eng 4(1):153–155
Article Google Scholar
Abbas AW, Ahmad N, Ali H (2012) Pashto Spoken Digits database for the automatic speech recognition research. In: 18th International Conference on Automation and Computing (ICAC), 2012. IEEE, pp 1–5
Hautamäki V, Siniscalchi SM, Behravan H, Salerno VM, Kukanov I (2015) Boosting universal speech attributes classification with deep neural network for foreign accent characterization. In: Sixteenth annual conference of the international speech communication association, 2015
Rao K, Sak H (2017) Multi-accent speech recognition with hierarchical grapheme based models. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017. IEEE, New York, pp 4815–4819
Barua PD, Dogan S, Tuncer T, Baygin M, Acharya UR (2021) Novel automated PD detection system using aspirin pattern with EEG signals. Comput Biol Med 137:104841
Article Google Scholar
Aydemir E, Tuncer T, Dogan S, Gururajan R, Acharya UR (2021) Automated major depressive disorder detection using melamine pattern with EEG signals. Appl Intell 51(9):6449–6466
Article Google Scholar
Tuncer T, Dogan S, Baygin M, Acharya UR (2022) Tetromino pattern based accurate EEG emotion classification model. Artif Intell Med 123:102210
Article Google Scholar
Zubair S, Yan F, Wang W (2013) Dictionary learning based sparse coefficients for audio classification with max and average pooling. Digital Signal Process 23(3):960–970
Article MathSciNet Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012. pp 1097–1105
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015. pp 1–9
Raghu S, Sriraam N (2018) Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms. Expert Syst Appl 113:18–32
Article Google Scholar
Deshwal D, Sangwan P, Kumar D (2019) Feature extraction methods in language identification: a survey. Wireless Pers Commun 107(4):2071–2103
Article Google Scholar
Li H, Ma B, Lee KA (2013) Spoken language recognition: from fundamentals to practice. Proc IEEE 101(5):1136–1159
Article Google Scholar
Jothilakshmi S, Ramalingam V, Palanivel S (2012) A hierarchical language identification system for Indian languages. Digital Signal Processing 22(3):544–553
Article MathSciNet Google Scholar
Li K-P (1997) Automatic language identification/verification system. Google Patents
Dey S, Rajan R, Padmanabhan R, Murthy HA (2011) Feature diversity for emotion, language and speaker verification. In: 2011 National Conference on Communications (NCC), 2011. IEEE, New York, pp 1–5
Morales L, Li FF (2018) A new verification of the speech transmission index for the English language. Speech Commun 105:1–11
Article Google Scholar
Wong K-YE (2004) Automatic spoken language identification utilizing acoustic and phonetic speech information. Queensland University of Technology
Grachev AM, Ignatov DI, Savchenko AV (2019) Compression of recurrent neural networks for efficient language modeling. Appl Soft Comput 79:354–362
Article Google Scholar
Lyu D-C, Chng E-S, Li H (2013) Language diarization for conversational code-switch speech with pronunciation dictionary adaptation. In: 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013. IEEE, pp 147–150
Makowski R, Hossa R (2020) Voice activity detection with quasi-quadrature filters and GMM decomposition for speech and noise. Appl Acoust 166:107344
Article Google Scholar
Tan Z-H, Dehak N (2020) rVAD: an unsupervised segment-based robust voice activity detection method. Comput Speech Lang 59:1–21
Article Google Scholar
Zhu M, Wu X, Lu Z, Wang T, Zhu X (2019) Long-term speech information based threshold for voice activity detection in massive microphone network. Digital Signal Process 94:156–164
Article Google Scholar
Shin JW, Chang J-H, Kim NS (2010) Voice activity detection based on statistical models and machine learning approaches. Comput Speech Lang 24(3):515–530
Article Google Scholar
Abraham J, Khan AN, Shahina A (2021) A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients. Int J Speech Technol, pp 1–9
Kingsbury B, Saon G, Mangu L, Padmanabhan M, Sarikaya R (2002) Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. IEEE, New York, pp I-53–I-56
Nemer E, Goubran R, Mahmoud S (2001) Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans Speech Audio Process 9(3):217–231
Article Google Scholar
Park TJ, Kanda N, Dimitriadis D, Han KJ, Watanabe S, Narayanan S (2022) A review of speaker diarization: recent advances with deep learning. Comput Speech Lang 72:101317
Article Google Scholar
Bhanja CC, Laskar MA, Laskar RH (2019) A pre-classification-based language identification for Northeast Indian languages using prosody and spectral features. Circuits Systems Signal Process 38(5):2266–2296
Article Google Scholar
Kumar P, Biswas A, Mishra AN, Chandra M (2010) Spoken language identification using hybrid feature extraction methods. arXiv preprint arXiv:10035623
Yasmin G, Das AK, Nayak J, Pelusi D, Ding W (2020) Graph based feature selection investigating boundary region of rough set for language identification. Expert Syst Appl, p 113575
Gazeau V, Varol C (2018) Automatic spoken language recognition with neural networks. Int J Inf Technol Comput Sci(IJITCS) 10(8):11–17
Safitri NE, Zahra A, Adriani M (2016) Spoken language identification with phonotactics methods on minangkabau, sundanese, and javanese languages. Proc Comp Sci 81:182–187
Article Google Scholar
Saleem S, Subhan F, Naseer N, Bais A, Imtiaz A (2020) Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. Forensic Sci Int Digital Invest 34:300982
Article Google Scholar
VoxForge (2020) Open source speech corpus. http://www.voxforge.org/
YouTube (2020) www.youtube.com
NHC (2020) https://www.nch.com.au/wavepad/index.html
Savchenko AV, Savchenko LV (2015) Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn Lett 65:145–151
Article Google Scholar
Reddy VR, Maity S, Rao KS (2013) Identification of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol 16(4):489–511
Article Google Scholar
Kuncan F, Kaya Y, Kuncan M (2019) Sensör işaretlerinden cinsiyet tanıma için yerel ikili örüntüler tabanlı yeni yaklaşımlar. J Faculty Eng Archit Gazi Univ 34(4)
Zhang Z Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), 2018. IEEE, New York, pp 1–2
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980
Nakano T, Nukala BT, Tsay J, Zupancic S, Rodriguez A, Lie DY, Lopez J, Nguyen TQ (2017) Gaits classification of normal vs. patients by wireless gait sensor and Support Vector Machine (SVM) classifier. Int J Softw Innovation (IJSI) 5(1):17–29
Aljerf L (2016) Reduction of gas emission resulting from thermal ceramic manufacturing processes through development of industrial conditions. Sci J King Faisal Univ 17(1):1–10
Google Scholar
Tuncer T, Ertam F, Dogan S, Aydemir E, Pławiak P (2020) Ensemble residual network-based gender and activity recognition method with signals. J Supercomput 76(3):2119–2138
Article Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article MathSciNet Google Scholar
Cao X, Wu C, Yan P, Li X Linear SVM classification using boosting HOG features for vehicle detection in low-altitude airborne videos. In: 2011 18th IEEE international conference on image processing, 2011. IEEE, New York, pp 2421–2424
Jain U, Nathani K, Ruban N, Raj ANJ, Zhuang Z, Mahesh VG Cubic SVM classifier based feature extraction and emotion detection from speech signals. In: 2018 international conference on sensor networks and signal processing (SNSP), 2018. IEEE, New York, pp 386–391
Maillo J, Ramírez S, Triguero I, Herrera F (2017) kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data. Knowl-Based Syst 117:3–15
Article Google Scholar
VoxForge (2020) VoxForge, Free Speech Recognition, www.voxforge.org
Lounnas K, Abbas M, Teffahi H, Lichouri MA (2019) language identification system based on voxforge speech corpus. International conference on advanced machine learning technologies and applications. Springer, Cham, pp 529–534
Muthusamy YK, Cole RA, Oshika BT The OGI multi-language telephone speech corpus. In: Second International Conference on Spoken Language Processing, 1992
Design CM (2020) https://www.cmdnyc.com/
Shtooka (2020) http://shtooka.net/
Tuncer T, Dogan S, Akbal E, Cicekli A, Acharya UR (2021) Development of accurate automated language identification model using polymer pattern and tent maximum absolute pooling techniques. Neural Comput Appl 34(6):4875–4888. https://doi.org/10.1007/s00521-021-06678-0
Article Google Scholar
Bansal P, Singh V, Beg M (2019) A multi-featured hybrid model for speaker recognition on multi-person speech. J Electrical Eng Technol 14(5):2117–2125
Article Google Scholar
Yamagishi J, Veaux C, MacDonald K (2019) CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92)

Download references

Author information

Authors and Affiliations

Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
Erhan Akbal, Turker Tuncer & Sengul Dogan
School of Business (Information System), University of Southern Queensland, Toowoomba, QLD, 4350, Australia
Prabal Datta Barua
Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, 2007, Australia
Prabal Datta Barua
Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, 599489, Singapore
U. Rajendra Acharya
Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore, Singapore
U. Rajendra Acharya
Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan
U. Rajendra Acharya

Authors

Erhan Akbal
View author publications
You can also search for this author in PubMed Google Scholar
Prabal Datta Barua
View author publications
You can also search for this author in PubMed Google Scholar
Turker Tuncer
View author publications
You can also search for this author in PubMed Google Scholar
Sengul Dogan
View author publications
You can also search for this author in PubMed Google Scholar
U. Rajendra Acharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to U. Rajendra Acharya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akbal, E., Barua, P.D., Tuncer, T. et al. Development of novel automated language classification model using pyramid pattern technique with speech signals. Neural Comput & Applic 34, 21319–21333 (2022). https://doi.org/10.1007/s00521-022-07613-7

Download citation

Received: 05 November 2021
Accepted: 04 July 2022
Published: 25 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00521-022-07613-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of novel automated language classification model using pyramid pattern technique with speech signals

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Development of accurate automated language identification model using polymer pattern and tent maximum absolute pooling techniques

A Novel Approach for Spoken Language Identification and Performance Comparison Using Machine Learning-Based Classifiers and Neural Network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of novel automated language classification model using pyramid pattern technique with speech signals

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Development of accurate automated language identification model using polymer pattern and tent maximum absolute pooling techniques

A Novel Approach for Spoken Language Identification and Performance Comparison Using Machine Learning-Based Classifiers and Neural Network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation