Abstract
In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In children’s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000 Hz to 8000 Hz, whereas it ranges 3000 Hz 8000 Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000 Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kids’ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Cucchiarini, C., Strik, H., Boves, L.: Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Commun. 30(2–3), 109–119 (2000)
Franco, H., Neumeyer, L., Ramos, M., Bratt, H.: Automatic detection of phone-level mispronunciation for language learning. In: Sixth European Conference on Speech Communication and Technology, pp. 851–854 (1999)
García, V., Mollineda, R.A., Sánchez, J.S., Alejo, R., Sotoca, J.M.: When overlapping unexpectedly alters the class imbalance effects. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 499–506. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72849-8_63
Grunwell, P.: Clinical Phonology. Aspen Publishers, New York (1982)
Harrison, A.M., Lo, W.K., Qian, X.J., Meng, H.: Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: International Workshop on Speech and Language Technology in Education, pp. 45–48 (2009)
Hodson, B.W.: The Assessment of Phonological Processes. Interstate Printers and Publishers, Danville (1980)
Hsu, C.W., et al.: A practical guide to support vector classification, pp. 1–16 (2003)
Huang, X., Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)
Ingram, D.: Phonological rules in young children. J. Child Lang. 1(1), 49–64 (1974)
Johnston, J.D.: Transform coding of audio signals using perceptual noise criteria. IEEE J. Sel. Areas Commun. 6(2), 314–323 (1988)
Kent, R.D., Vorperian, H.K.: Speech impairment in down syndrome: a review. J. Speech Lang. Hear. Res. 56(1), 178–210 (2013)
Lee, A., Glass, J.: A comparison-based approach to mispronunciation detection. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 382–387. IEEE (2012)
Li, K., Qian, X., Meng, H.: Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(1), 193–207 (2017)
Li, W., Siniscalchi, S.M., Chen, N.F., Lee, C.H.: Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6135–6139. IEEE (2016)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Martin, P.: Winpitch LTL, un logiciel multimédia d’enseignement de la prosodie. Alsic. Apprentissage des Langues et Systèmes d’Information et de Communication 8(2), 95–108 (2005)
Miodonska, Z., Bugdol, M.D., Krecichwost, M.: Dynamic time warping in phoneme modeling for fast pronunciation error detection. Comput. Biol. Med. 69, 277–285 (2016)
Murty, K.S.R., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Peeters, G.: A large set of audio features for sound description (similarity and classification). CUIDADO Project IRCAM Technical Report (2004)
Pour, A.F., Asgari, M., Hasanabadi, M.R.: Gammatonegram based speaker identification. In: 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 52–55. IEEE (2014)
Qian, X., Meng, H., Soong, F.K.: The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. In: Thirteenth Annual Conference of the International Speech Communication Association, pp. 775–778 (2012)
Ramteke, P.B., Supanekar, S., Hegde, P., Nelson, H., Aithal, V., Koolagudi, S.G.: NITK Kids’ speech corpus. In: Proceedings of Interspeech 2019, pp. 331–335 (2019)
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331–1334. IEEE (1997)
Shriberg, L.D., Kwiatkowski, J.: Phonological disorders I: a diagnostic classification system. J. Speech Hear. Disord. 47(3), 226–241 (1982)
Sturm, B.L.: An introduction to audio content analysis: applications in signal processing and music informatics by alexander lerch. Comput. Music J. 37(4), 90–91 (2013)
Tiwari, V.: MFCC and its applications in speaker recognition. Int. J. Emerg. Technol. 1(1), 19–22 (2010)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(Nov), 45–66 (2001)
Venkitaraman, A., Adiga, A., Seelamantula, C.S.: Auditory-motivated gammatone wavelet transform. Signal Process. 94, 608–619 (2014)
Wei, S., Hu, G., Hu, Y., Wang, R.H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)
Acknowledgment
The authors would like to thank the Cognitive Science Research Initiative (CSRI), Department of Science & Technology, Government of India, Grant no. SR/CSRI/ 49/2015, for its financial support on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramteke, P.B., Supanekar, S., Aithal, V., Koolagudi, S.G. (2020). Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-66187-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66186-1
Online ISBN: 978-3-030-66187-8
eBook Packages: Computer ScienceComputer Science (R0)