Abstract
Stuttering is a neuro-developmental disorder represented in \(1\%\) of the population. Dysfluency classification is still an open research question, with concerns of which feature representation or which classifier to use. Another issue, which has been neglected so far, is how to deal with audio samples that contain more than one type of dysfluency. Research has mostly preferred considering only single-labels problems, in part due to the lack of substantial multi-labels datasets. However, the FluencyBank and SEP-28K datasets are now available and contain multi-label data, which should pave the way for more research taking this aspect into account.
In this paper, we give an overview of different ways to handle multi-label classification and compare them, while fine-tuning the ResNet50 network to perform multi-label dysfluency classification. We show that, fine-tuning the ResNet50, independently of the label representation, performs better than current state of the art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aravind, P., Nechiyil, U., Paramparambath, N., et al.: Audio spoofing verification using deep convolutional neural networks by transfer learning. arXiv preprint arXiv:2008.03464 (2020)
Chee, L.S., Ai, O.C., Hariharan, M., Yaacob, S.: MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. In: 2009 IEEE Student Conference on Research and Development (SCOReD), pp. 146–149. IEEE (2009)
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516. IEEE (2013)
Geetha, Y., Pratibha, K., Ashok, R., Ravindra, S.K.: Classification of childhood disfluencies using neural networks. J. Fluen. Disord. 25(2), 99–117 (2000)
Georgila, K.: Using integer linear programming for detecting speech disfluencies. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 109–112 (2009)
Gerczuk, M., Amiriparian, S., Ottl, S., Schuller, B.W.: EmoNet: a transfer learning framework for multi-corpus speech emotion recognition. IEEE Trans. Affect. Comput. (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howell, P., Davis, S., Bartrip, J.: The university college London archive of stuttered speech (UCLASS) (2009)
Howell, P., Sackin, S., Glenn, K.: Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers. J. Speech Lang. Hear. Res. 40(5), 1073–1084 (1997)
Jouaiti, M., Dautenhahn, K.: Dysfluency classification in stuttered speech using deep learning for real-time applications. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6482–6486 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746638
Kourkounakis, T., Hajavi, A., Etemad, A.: Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6089–6093. IEEE (2020)
Kourkounakis, T., Hajavi, A., Etemad, A.: FluentNet: end-to-end detection of speech disfluency with deep learning. arXiv preprint arXiv:2009.11394 (2020)
Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., Stober, S.: Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290 (2017)
Latif, S., Rana, R., Younis, S., Qadir, J., Epps, J.: Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353 (2018)
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., Bigham, J.P.: Sep-28k: a dataset for stuttering event detection from podcasts with people who stutter. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6798–6802. IEEE (2021)
Mahesha, P., Vinod, D.S.: Classification of speech dysfluencies using speech parameterization techniques and multiclass SVM. In: Singh, K., Awasthi, A.K. (eds.) QShine 2013. LNICST, vol. 115, pp. 298–308. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37949-9_26
Marcinek, L., Stone, M., Millman, R., Gaydecki, P.: N-MTTL SI model: non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification. In: Interspeech (2021)
Matassoni, M., Gretter, R., Falavigna, D., Giuliani, D.: Non-native children speech recognition through transfer learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6229–6233. IEEE (2018)
Oue, S., Marxer, R., Rudzicz, F.: Automatic dysfluency detection in dysarthric speech using deep belief networks. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, pp. 60–64 (2015)
Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D.: Improved speech emotion recognition using transfer learning and spectrogram augmentation. In: Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 645–652 (2021)
Ratner, N.B., MacWhinney, B.: Fluency bank: a new resource for fluency research and practice. J. Fluen. Disord. 56, 69–80 (2018)
Ravikumar, K., Rajagopal, R., Nagaraj, H.: An approach for objective assessment of stuttered speech using MFCC. In: The International Congress for Global Science and Technology, p. 19 (2009)
Ravikumar, K., Reddy, B., Rajagopal, R., Nagaraj, H.: Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies. Proc. World Acad. Sci. Eng. Technol. 36, 270–273 (2008)
Santoso, J., Yamada, T., Makino, S.: Categorizing error causes related to utterance characteristics in speech recognition. Proc. NCSP 19, 514–517 (2019)
Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: StutterNet: stuttering detection using time delay neural network. arXiv preprint arXiv:2105.05599 (2021)
Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: Machine learning for stuttering identification: review, challenges & future directions. arXiv preprint arXiv:2107.04057 (2021)
Shivakumar, P.G., Georgiou, P.: Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput. Speech Lang. 63, 101077 (2020)
Suszyński, W., Kuniszyk-Jóźkowiak, W., Smołka, E., Dzieńkowski, M.: Prolongation detection with application of fuzzy logic. Ann. Universitatis Mariae Curie-Sklodowska Sectio AI-Informatica 1(1), 1–8 (2015)
Szczurowska, I., Kuniszyk-Jóźkowiak, W., Smołka, E.: The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Arch. Acoust. 31(4(S)), 205–210 (2014)
Villegas, B., Flores, K.M., Acuña, K.J., Pacheco-Barrios, K., Elias, D.: A novel stuttering disfluency classification system based on respiratory biosignals. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4660–4663. IEEE (2019)
Wang, D., Zheng, T.F.: Transfer learning for speech and language processing. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1225–1237. IEEE (2015)
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Yildirim, S., Narayanan, S.: Automatic detection of disfluency boundaries in spontaneous speech of children using audio-visual information. IEEE Trans. Audio Speech Lang. Process. 17(1), 2–12 (2009)
Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. arXiv preprint arXiv:1604.03209 (2016)
Acknowledgments
This research was undertaken, in part, thanks to funding from the Canada 150 Research Chairs Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Jouaiti, M., Dautenhahn, K. (2022). Multi-label Dysfluency Classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)