Skip to main content

Identification of Nasalization and Nasal Assimilation from Children’s Speech

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11987))

  • 248 Accesses

Abstract

In children, nasalization is a commonly observed phonological process where the non-nasal sounds are substituted with nasal sounds. Here, an attempt has been made for the identification of nasalization and nasal assimilation. The properties of nasal sounds and nasalized voiced sounds are explored using MFCCs extracted from Hilbert envelope of the numerator of group delay (HNGD) Spectrum. HNGD Spectrum highlights the formants in the speech and extra nasal formant in the vicinity of first formant in nasalized voiced sounds. Features extracted from correctly pronounced and mispronounced words are compared using Dynamic Time Warping (DTW) algorithm. The nature of the deviation of DTW comparison path from its diagonal behavior is analyzed for the identification of mispronunciation. The combination of FFT based MFCCs and HNGD spectrum based MFCCs are observed to achieve highest accuracy of 82.22% within the tolerance range of ±50 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anand, J.M., Guruprasad, S., Yegnanarayana, B.: Extracting formants from short segments of speech using group delay functions. In: INTERSPEECH-2006, pp. 1009–1012. IEEE (2006)

    Google Scholar 

  2. Cucchiarini, C., Strik, H., Boves, L.: Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Commun. 30(2–3), 109–119 (2000)

    Article  Google Scholar 

  3. Dubey, A.K., Prasanna, S.M., Dandapat, S.: Zero time windowing based severity analysis of hypernasal speech. In: 2016 IEEE Region 10 Conference (TENCON), pp. 970–974. IEEE (2016)

    Google Scholar 

  4. Franco, H., Neumeyer, L., Ramos, M., Bratt, H.: Automatic detection of phone-level mispronunciation for language learning. In: Sixth European Conference on Speech Communication and Technology, pp. 851–854 (1999)

    Google Scholar 

  5. Grunwell, P.: Clinical Phonology. Aspen Publishers, New York (1982)

    Google Scholar 

  6. Harrison, A.M., Lo, W.K., Qian, X.j., Meng, H.: Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: International Workshop on Speech and Language Technology in Education (SLaTE), pp. 45–48 (2009)

    Google Scholar 

  7. Hodson, B.W.: The Assessment of Phonological Processes. Interstate Printers and Publishers, Danville (1980)

    Google Scholar 

  8. Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)

    Google Scholar 

  9. Ingram, D.: Phonological rules in young children. J. Child Lang. 1(1), 49–64 (1974)

    Article  Google Scholar 

  10. Kent, R.D., Vorperian, H.K.: Speech impairment in down syndrome: a review. J. Speech Lang. Hear. Res. 56(1), 178–210 (2013)

    Article  Google Scholar 

  11. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2004). https://doi.org/10.1007/s10115-004-0154-9

    Article  Google Scholar 

  12. Lee, A., Glass, J.: A comparison-based approach to mispronunciation detection. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 382–387. IEEE (2012)

    Google Scholar 

  13. Li, K., Qian, X., Meng, H.: Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 193–207 (2017)

    Article  Google Scholar 

  14. Li, W., Siniscalchi, S.M., Chen, N.F., Lee, C.H.: Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6135–6139. IEEE (2016)

    Google Scholar 

  15. Martin, P.: WinPitch LTL II, a multimodal pronunciation software. In: In-STIL/ICALL Symposium (2004)

    Google Scholar 

  16. Miodonska, Z., Bugdol, M.D., Krecichwost, M.: Dynamic time warping in phoneme modeling for fast pronunciation error detection. Comput. Biol. Med. 69, 277–285 (2016)

    Article  Google Scholar 

  17. Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)

    Article  Google Scholar 

  18. Murty, K.S.R., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)

    Article  Google Scholar 

  19. Qian, X., Meng, H., Soong, F.K.: The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. In: INTERSPEECH, pp. 775–778 (2012)

    Google Scholar 

  20. Ramteke, P.B., Koolagudi, S.G., Afroz, F.: Repetition detection in stuttered speech. In: Nagar, A., Mohapatra, D.P., Chaki, N. (eds.) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. SIST, vol. 43, pp. 611–617. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-2538-6_63

    Chapter  Google Scholar 

  21. Ramteke, P.B., Supanekar, S., Hegde, P., Nelson, H., Aithal, V., Koolagudi, S.G.: NITK Kids’ speech corpus. In: Proceedings of Interspeech 2019, pp. 331–335 (2019)

    Google Scholar 

  22. Shriberg, L.D., Kwiatkowski, J.: Phonological disorders I: a diagnostic classification system. J. Speech Hear. Disord. 47(3), 226–241 (1982)

    Article  Google Scholar 

  23. Tiwari, V.: MFCC and its applications in speaker recognition. Int. J. Emerg. Technol. 1(1), 19–22 (2010)

    Google Scholar 

  24. Wei, S., Hu, G., Hu, Y., Wang, R.H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank the Cognitive Science Research Initiative (CSRI), Department of Science & Technology, Government of India, Grant no. SR/CSRI/ 49/2015, for its financial support on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pravin Bhaskar Ramteke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramteke, P.B., Supanekar, S., Aithal, V., Koolagudi, S.G. (2020). Identification of Nasalization and Nasal Assimilation from Children’s Speech. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66187-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66186-1

  • Online ISBN: 978-3-030-66187-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics