Abstract
In children, nasalization is a commonly observed phonological process where the non-nasal sounds are substituted with nasal sounds. Here, an attempt has been made for the identification of nasalization and nasal assimilation. The properties of nasal sounds and nasalized voiced sounds are explored using MFCCs extracted from Hilbert envelope of the numerator of group delay (HNGD) Spectrum. HNGD Spectrum highlights the formants in the speech and extra nasal formant in the vicinity of first formant in nasalized voiced sounds. Features extracted from correctly pronounced and mispronounced words are compared using Dynamic Time Warping (DTW) algorithm. The nature of the deviation of DTW comparison path from its diagonal behavior is analyzed for the identification of mispronunciation. The combination of FFT based MFCCs and HNGD spectrum based MFCCs are observed to achieve highest accuracy of 82.22% within the tolerance range of ±50 ms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anand, J.M., Guruprasad, S., Yegnanarayana, B.: Extracting formants from short segments of speech using group delay functions. In: INTERSPEECH-2006, pp. 1009–1012. IEEE (2006)
Cucchiarini, C., Strik, H., Boves, L.: Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Commun. 30(2–3), 109–119 (2000)
Dubey, A.K., Prasanna, S.M., Dandapat, S.: Zero time windowing based severity analysis of hypernasal speech. In: 2016 IEEE Region 10 Conference (TENCON), pp. 970–974. IEEE (2016)
Franco, H., Neumeyer, L., Ramos, M., Bratt, H.: Automatic detection of phone-level mispronunciation for language learning. In: Sixth European Conference on Speech Communication and Technology, pp. 851–854 (1999)
Grunwell, P.: Clinical Phonology. Aspen Publishers, New York (1982)
Harrison, A.M., Lo, W.K., Qian, X.j., Meng, H.: Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: International Workshop on Speech and Language Technology in Education (SLaTE), pp. 45–48 (2009)
Hodson, B.W.: The Assessment of Phonological Processes. Interstate Printers and Publishers, Danville (1980)
Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)
Ingram, D.: Phonological rules in young children. J. Child Lang. 1(1), 49–64 (1974)
Kent, R.D., Vorperian, H.K.: Speech impairment in down syndrome: a review. J. Speech Lang. Hear. Res. 56(1), 178–210 (2013)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2004). https://doi.org/10.1007/s10115-004-0154-9
Lee, A., Glass, J.: A comparison-based approach to mispronunciation detection. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 382–387. IEEE (2012)
Li, K., Qian, X., Meng, H.: Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 193–207 (2017)
Li, W., Siniscalchi, S.M., Chen, N.F., Lee, C.H.: Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6135–6139. IEEE (2016)
Martin, P.: WinPitch LTL II, a multimodal pronunciation software. In: In-STIL/ICALL Symposium (2004)
Miodonska, Z., Bugdol, M.D., Krecichwost, M.: Dynamic time warping in phoneme modeling for fast pronunciation error detection. Comput. Biol. Med. 69, 277–285 (2016)
Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
Murty, K.S.R., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)
Qian, X., Meng, H., Soong, F.K.: The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. In: INTERSPEECH, pp. 775–778 (2012)
Ramteke, P.B., Koolagudi, S.G., Afroz, F.: Repetition detection in stuttered speech. In: Nagar, A., Mohapatra, D.P., Chaki, N. (eds.) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. SIST, vol. 43, pp. 611–617. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-2538-6_63
Ramteke, P.B., Supanekar, S., Hegde, P., Nelson, H., Aithal, V., Koolagudi, S.G.: NITK Kids’ speech corpus. In: Proceedings of Interspeech 2019, pp. 331–335 (2019)
Shriberg, L.D., Kwiatkowski, J.: Phonological disorders I: a diagnostic classification system. J. Speech Hear. Disord. 47(3), 226–241 (1982)
Tiwari, V.: MFCC and its applications in speaker recognition. Int. J. Emerg. Technol. 1(1), 19–22 (2010)
Wei, S., Hu, G., Hu, Y., Wang, R.H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)
Acknowledgment
The authors would like to thank the Cognitive Science Research Initiative (CSRI), Department of Science & Technology, Government of India, Grant no. SR/CSRI/ 49/2015, for its financial support on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramteke, P.B., Supanekar, S., Aithal, V., Koolagudi, S.G. (2020). Identification of Nasalization and Nasal Assimilation from Children’s Speech. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-66187-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66186-1
Online ISBN: 978-3-030-66187-8
eBook Packages: Computer ScienceComputer Science (R0)