Abstract
In linguistics, phonemes are the atomic sound, called word segmentor play an important role to recognize the word properly. A novel approach of seven Bengali vowels and ten diphthongs (a syllable for the pronunciation of two consecutive vowels) phoneme recognition has been proposed in the paper. In the proposed method, before extracting the feature, a novel pre-processing technique using amplitude interpolation method has been developed to align the starting point of all the phonemes of the same class which in turn boosts the recognition rate. Here seven Bengali vowels and ten diphthongs audio clips uttered by twenty persons (ten times each) of different age group and sex have been recorded to create a data set of 3400 audio samples for the proposed experiment. For each class of phonemes and diphthongs one sample (selected by linguistic) have been considered as a benchmark. Then each of the recorded audio clips is interpolated to match with the benchmark clip of the corresponding phoneme by finding the valleys in the amplitude using Lagrange interpolation technique. After that, 19 MFCC (Mel Frequency Cepstral Co-Efficient) speech features have been extracted from each phoneme of the interpolated audio clips and feed to classify using Support Vector Machine (SVM), k- Nearest Neighbour (KNN) and Deep Neural Network (DNN) classifier and the average classification accuracy obtained for vowels and diphthongs are 94.93% and 94.56% respectively. To check the effectiveness of the proposed pre-processing technique same MFCC features have been extracted from the raw recorded phonemes and feed to same classifiers and average accuracy obtained for vowels and diphthongs are 89.21% and 88.56% respectively which shows the effectiveness of the proposed method. It is also to note that best accuracy obtained using the DNN classifier with the accuracy of 98.16% for vowels and 97% for diphthongs.
Similar content being viewed by others
References
Ahammad K, Rahman MM (2016) Connected bangla speech recognition using artificial neural network. Int J Comput Appl 149(9):38–41
Ahmed M, Shill PC, Islam K, Mollah MAS, Akhand MAH (2015) Acoustic modeling using deep belief network for Bangla speech recognition. In: 2015 18th International Conference on Computer and Information Technology (ICCIT), pp 306–311. https://doi.org/10.1109/ICCITechn.2015.7488087
Bastanfard A, Kelishami AA, Fazel M, Aghaahmadi M (2009) A comprehensive audio-visual corpus for teaching sound Persian phoneme articulation. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, pp 169–174. https://doi.org/10.1109/ICSMC.2009.5346591
Bastanfard A, Fazel M, Kelishami AA, Aghaahmadi M (2010) The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation. In: Boll S, Tian Q, Zhang L, Zhang Z, Chen YPP (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_30
Bastanfard A, Rezaei NA, Mottaghizadeh M, Fazel M (2010) A novel multimedia educational speech therapy system for hearing impaired children. In: Qiu G, Lam KM, Kiya H, Xue XY, Kuo CCJ, Lew MS (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15696-0_65
Bhatt S, Dev A, Jain A (2018) Hindi speech vowel recognition using hidden Markov model. Proc. the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, pp 201–204. https://doi.org/10.21437/SLTU.2018-42
Bhowmik T, Mandal SKD (2018) Manner of articulation based Bengali phoneme classification. Int J Speech Technol 21(2):233–250
Bird JJ, Wanner E, Ekárt A, Faria DR (2020) Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms. Expert Syst Appl 153:113402
Das B, Mandal S, Mitra P, Basu A (2013) Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels. Int J Speech Technol 16(1):19–31
Dey S, Alam MA (2018) Formant based bangla vowel perceptual space classification using support vector machine and K-nearest neighbor method. In: 2018 21st International Conference of Computer and Information Technology (ICCIT), pp 1–5. https://doi.org/10.1109/ICCITECHN.2018.8631948
Eity QN, Banik M, Lisa NJ, Hassan F, Hossain MS, Huda MN (2010) Bangla speech recognition using two stage multilayer neural networks. In: 2010 International Conference on Signal and Image Processing, pp 222–226. https://doi.org/10.1109/ICSIP.2010.5697473
Gamit MR, Dhameliya K (2015) Isolated words recognition using MFCC, LPC and neural network. Int J Res Engin technol 4(6):146–149
Hou Y, Zheng XF (2011) SVMbasedMLP neural network algorithm and application in intrusion detection. In: Deng H, Miao D, Lei J, Wang FL (eds) Artificial Intelligence and Computational Intelligence. AICI 2011. Lecture Notes in Computer Science, vol 7004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23896-3_41
Kibria S, Samin AM, Kobir MH, Rahman MS, Selim MR, Iqbal MZ (2022) Bangladeshi Bangla speech corpus for automatic speech recognition research. Speech Comm 136:84–97
Krishnamoorthy P, Prasanna SM (2011) Enhancement of noisy speech by temporal and spectral processing. Speech Comm 53(2):154–174
Lin MT, Lee CK, Lin CY (1999) Consonant/vowel segmentation for mandarin syllable recognition. Comput Speech Lang 13(3):207–222
Liu YT, Tsao Y, Chang RY (2015) A deep neural network based approach to mandarin consonant/vowel separation. In 2015 IEEE international conference on consumer electronics-Taiwan (pp. 324-325). IEEE.
Mahdavi R, Bastanfard A, Amirkhani D (2020) Persian accents identification using modeling of speech articulatory features. In: 2020 25th International Computer Conference, Computer Society of Iran (CSICC), pp 1–9. https://doi.org/10.1109/CSICC49403.2020.9050139
Manjunath KE, Kumar SBS, Pati D, Satapathy B, Rao KS (2013) Development of consonant-vowel recognition systems for Indian languages: Bengali and Odia. In: 2013 Annual IEEE India Conference (INDICON), pp 1–6. https://doi.org/10.1109/INDCON.2013.6726109
Mayr R, Davies H (2011) A cross-dialectal acoustic study of the monophthongs and diphthongs of Welsh. J Int Phon Assoc 41(1):1–25. https://doi.org/10.1017/S0025100310000290
Mukherjee H, Phadikar S, Roy K (2018) An ensemble learning-based Bangla phoneme recognition system using LPCC-2 features. In intelligent engineering informatics (pp. 61–69). Springer, Singapore.
Paul B, Mukherjee H, Phadikar S, Roy K (2020) MFCC-based Bangla vowel phoneme recognition from Micro clips. In: Bhateja V, Satapathy S, Zhang YD, Aradhya V (eds) Intelligent Computing and Communication. ICICC 2019. Advances in Intelligent Systems and Computing, vol 1034. Springer, Singapore. https://doi.org/10.1007/978-981-15-1084-7_49
Paul B., Phadikar S, Bera S (2021) Indian regional spoken language identification using deep learning approach. In: Giri D, Buyya R, Ponnusamy S, De D, Adamatzky A, Abawajy JH (eds) Proceedings of the Sixth International Conference on Mathematics and Computing. Advances in Intelligent Systems and Computing, vol 1262. Springer, Singapore. https://doi.org/10.1007/978-981-15-8061-1_21
Selva J (2009) Functionally weighted Lagrange interpolation of band-limited signals from nonuniform samples. IEEE Trans Signal Proc 57(1):168–181. https://doi.org/10.1109/TSP.2008.2007101
Serpen G, Gao Z (2014) Complexity analysis of multilayer perceptron neural network embedded into a wireless sensor network. Procedia Comput Sci 36:192–197
Siniscalchi SM, Yu D, Deng L, Lee CH (2013) Exploiting deep neural networks for detection-based speech recognition. Neurocomputing 106:148–157
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852
Sumarni L (2017) Utilizing audacity audio-recording software to improve consecutive and simultaneous interpreting skills. Int J Indonesian Educ Teach (IJIET) 1(2):185–193
Swarna ST, Ehsan S, Islam M, Jannat ME (2017) A comprehensive survey on bengali phoneme recognition. arXiv preprint arXiv:1701.08156.
Zevin J, Word recognition (2009) In: Squire LR (ed) Encyclopedia of Neuroscience. Academic Press, pp 517–522. https://doi.org/10.1016/B978-008045046-9.01881-7; https://www.sciencedirect.com/science/article/pii/B9780080450469018817
Funding
The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study. No funds, grants, or other support was received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of Interest between the authors regarding the manuscript preparation and submission.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Paul, B., Phadikar, S. A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes. Multimed Tools Appl 82, 7735–7755 (2023). https://doi.org/10.1007/s11042-022-13594-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13594-5