A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Paul, Bachchu; Phadikar, Santanu

doi:10.1007/s11042-022-13594-5

A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Published: 08 August 2022

Volume 82, pages 7735–7755, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

156 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In linguistics, phonemes are the atomic sound, called word segmentor play an important role to recognize the word properly. A novel approach of seven Bengali vowels and ten diphthongs (a syllable for the pronunciation of two consecutive vowels) phoneme recognition has been proposed in the paper. In the proposed method, before extracting the feature, a novel pre-processing technique using amplitude interpolation method has been developed to align the starting point of all the phonemes of the same class which in turn boosts the recognition rate. Here seven Bengali vowels and ten diphthongs audio clips uttered by twenty persons (ten times each) of different age group and sex have been recorded to create a data set of 3400 audio samples for the proposed experiment. For each class of phonemes and diphthongs one sample (selected by linguistic) have been considered as a benchmark. Then each of the recorded audio clips is interpolated to match with the benchmark clip of the corresponding phoneme by finding the valleys in the amplitude using Lagrange interpolation technique. After that, 19 MFCC (Mel Frequency Cepstral Co-Efficient) speech features have been extracted from each phoneme of the interpolated audio clips and feed to classify using Support Vector Machine (SVM), k- Nearest Neighbour (KNN) and Deep Neural Network (DNN) classifier and the average classification accuracy obtained for vowels and diphthongs are 94.93% and 94.56% respectively. To check the effectiveness of the proposed pre-processing technique same MFCC features have been extracted from the raw recorded phonemes and feed to same classifiers and average accuracy obtained for vowels and diphthongs are 89.21% and 88.56% respectively which shows the effectiveness of the proposed method. It is also to note that best accuracy obtained using the DNN classifier with the accuracy of 98.16% for vowels and 97% for diphthongs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

Ahammad K, Rahman MM (2016) Connected bangla speech recognition using artificial neural network. Int J Comput Appl 149(9):38–41
Google Scholar
Ahmed M, Shill PC, Islam K, Mollah MAS, Akhand MAH (2015) Acoustic modeling using deep belief network for Bangla speech recognition. In: 2015 18th International Conference on Computer and Information Technology (ICCIT), pp 306–311. https://doi.org/10.1109/ICCITechn.2015.7488087
Bastanfard A, Kelishami AA, Fazel M, Aghaahmadi M (2009) A comprehensive audio-visual corpus for teaching sound Persian phoneme articulation. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp 169–174. https://doi.org/10.1109/ICSMC.2009.5346591
Bastanfard A, Fazel M, Kelishami AA, Aghaahmadi M (2010) The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation. In: Boll S, Tian Q, Zhang L, Zhang Z, Chen YPP (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_30
Bastanfard A, Rezaei NA, Mottaghizadeh M, Fazel M (2010) A novel multimedia educational speech therapy system for hearing impaired children. In: Qiu G, Lam KM, Kiya H, Xue XY, Kuo CCJ, Lew MS (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15696-0_65
Bhatt S, Dev A, Jain A (2018) Hindi speech vowel recognition using hidden Markov model. Proc. the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, pp 201–204. https://doi.org/10.21437/SLTU.2018-42
Bhowmik T, Mandal SKD (2018) Manner of articulation based Bengali phoneme classification. Int J Speech Technol 21(2):233–250
Article Google Scholar
Bird JJ, Wanner E, Ekárt A, Faria DR (2020) Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms. Expert Syst Appl 153:113402
Article Google Scholar
Das B, Mandal S, Mitra P, Basu A (2013) Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels. Int J Speech Technol 16(1):19–31
Article Google Scholar
Dey S, Alam MA (2018) Formant based bangla vowel perceptual space classification using support vector machine and K-nearest neighbor method. In: 2018 21st International Conference of Computer and Information Technology (ICCIT), pp 1–5. https://doi.org/10.1109/ICCITECHN.2018.8631948
Eity QN, Banik M, Lisa NJ, Hassan F, Hossain MS, Huda MN (2010) Bangla speech recognition using two stage multilayer neural networks. In: 2010 International Conference on Signal and Image Processing, pp 222–226. https://doi.org/10.1109/ICSIP.2010.5697473
Gamit MR, Dhameliya K (2015) Isolated words recognition using MFCC, LPC and neural network. Int J Res Engin technol 4(6):146–149
Article Google Scholar
Hou Y, Zheng XF (2011) SVMbasedMLP neural network algorithm and application in intrusion detection. In: Deng H, Miao D, Lei J, Wang FL (eds) Artificial Intelligence and Computational Intelligence. AICI 2011. Lecture Notes in Computer Science, vol 7004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23896-3_41
Kibria S, Samin AM, Kobir MH, Rahman MS, Selim MR, Iqbal MZ (2022) Bangladeshi Bangla speech corpus for automatic speech recognition research. Speech Comm 136:84–97
Article Google Scholar
Krishnamoorthy P, Prasanna SM (2011) Enhancement of noisy speech by temporal and spectral processing. Speech Comm 53(2):154–174
Article Google Scholar
Lin MT, Lee CK, Lin CY (1999) Consonant/vowel segmentation for mandarin syllable recognition. Comput Speech Lang 13(3):207–222
Article Google Scholar
Liu YT, Tsao Y, Chang RY (2015) A deep neural network based approach to mandarin consonant/vowel separation. In 2015 IEEE international conference on consumer electronics-Taiwan (pp. 324-325). IEEE.
Mahdavi R, Bastanfard A, Amirkhani D (2020) Persian accents identification using modeling of speech articulatory features. In: 2020 25th International Computer Conference, Computer Society of Iran (CSICC), pp 1–9. https://doi.org/10.1109/CSICC49403.2020.9050139
Manjunath KE, Kumar SBS, Pati D, Satapathy B, Rao KS (2013) Development of consonant-vowel recognition systems for Indian languages: Bengali and Odia. In: 2013 Annual IEEE India Conference (INDICON), pp 1–6. https://doi.org/10.1109/INDCON.2013.6726109
Mayr R, Davies H (2011) A cross-dialectal acoustic study of the monophthongs and diphthongs of Welsh. J Int Phon Assoc 41(1):1–25. https://doi.org/10.1017/S0025100310000290
Mukherjee H, Phadikar S, Roy K (2018) An ensemble learning-based Bangla phoneme recognition system using LPCC-2 features. In intelligent engineering informatics (pp. 61–69). Springer, Singapore.
Paul B, Mukherjee H, Phadikar S, Roy K (2020) MFCC-based Bangla vowel phoneme recognition from Micro clips. In: Bhateja V, Satapathy S, Zhang YD, Aradhya V (eds) Intelligent Computing and Communication. ICICC 2019. Advances in Intelligent Systems and Computing, vol 1034. Springer, Singapore. https://doi.org/10.1007/978-981-15-1084-7_49
Paul B., Phadikar S, Bera S (2021) Indian regional spoken language identification using deep learning approach. In: Giri D, Buyya R, Ponnusamy S, De D, Adamatzky A, Abawajy JH (eds) Proceedings of the Sixth International Conference on Mathematics and Computing. Advances in Intelligent Systems and Computing, vol 1262. Springer, Singapore. https://doi.org/10.1007/978-981-15-8061-1_21
Selva J (2009) Functionally weighted Lagrange interpolation of band-limited signals from nonuniform samples. IEEE Trans Signal Proc 57(1):168–181. https://doi.org/10.1109/TSP.2008.2007101
Serpen G, Gao Z (2014) Complexity analysis of multilayer perceptron neural network embedded into a wireless sensor network. Procedia Comput Sci 36:192–197
Article Google Scholar
Siniscalchi SM, Yu D, Deng L, Lee CH (2013) Exploiting deep neural networks for detection-based speech recognition. Neurocomputing 106:148–157
Article Google Scholar
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852
Article Google Scholar
Sumarni L (2017) Utilizing audacity audio-recording software to improve consecutive and simultaneous interpreting skills. Int J Indonesian Educ Teach (IJIET) 1(2):185–193
Article Google Scholar
Swarna ST, Ehsan S, Islam M, Jannat ME (2017) A comprehensive survey on bengali phoneme recognition. arXiv preprint arXiv:1701.08156.
Zevin J, Word recognition (2009) In: Squire LR (ed) Encyclopedia of Neuroscience. Academic Press, pp 517–522. https://doi.org/10.1016/B978-008045046-9.01881-7; https://www.sciencedirect.com/science/article/pii/B9780080450469018817

Download references

Funding

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study. No funds, grants, or other support was received.

Author information

Authors and Affiliations

Department of Computer Science, Vidyasagar University, Midnapore, West Bengal, 721102, India
Bachchu Paul
Department of Computer Science & Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, BF-142, Sector-I, Salt Lake, Kolkata, 700064, India
Santanu Phadikar

Authors

Bachchu Paul
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bachchu Paul.

Ethics declarations

Conflict of interest

There is no conflict of Interest between the authors regarding the manuscript preparation and submission.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Paul, B., Phadikar, S. A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes. Multimed Tools Appl 82, 7735–7755 (2023). https://doi.org/10.1007/s11042-022-13594-5

Download citation

Received: 24 March 2022
Revised: 18 May 2022
Accepted: 18 July 2022
Published: 08 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11042-022-13594-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation