Multi-label Dysfluency Classification

Jouaiti, Melanie; Dautenhahn, Kerstin

doi:10.1007/978-3-031-20980-2_25

Melanie Jouaiti¹¹ &
Kerstin Dautenhahn¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

1096 Accesses

Abstract

Stuttering is a neuro-developmental disorder represented in $1\%$ of the population. Dysfluency classification is still an open research question, with concerns of which feature representation or which classifier to use. Another issue, which has been neglected so far, is how to deal with audio samples that contain more than one type of dysfluency. Research has mostly preferred considering only single-labels problems, in part due to the lack of substantial multi-labels datasets. However, the FluencyBank and SEP-28K datasets are now available and contain multi-label data, which should pave the way for more research taking this aspect into account.

In this paper, we give an overview of different ways to handle multi-label classification and compare them, while fine-tuning the ResNet50 network to perform multi-label dysfluency classification. We show that, fine-tuning the ResNet50, independently of the label representation, performs better than current state of the art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning and Detecting Stuttering Disorders

Identification of Disfluency Among Children Using Efficient Machine Learning Techniques

Detecting Stuttering Events in Transcripts of Children’s Speech

References

Aravind, P., Nechiyil, U., Paramparambath, N., et al.: Audio spoofing verification using deep convolutional neural networks by transfer learning. arXiv preprint arXiv:2008.03464 (2020)
Chee, L.S., Ai, O.C., Hariharan, M., Yaacob, S.: MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. In: 2009 IEEE Student Conference on Research and Development (SCOReD), pp. 146–149. IEEE (2009)
Google Scholar
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516. IEEE (2013)
Google Scholar
Geetha, Y., Pratibha, K., Ashok, R., Ravindra, S.K.: Classification of childhood disfluencies using neural networks. J. Fluen. Disord. 25(2), 99–117 (2000)
Article Google Scholar
Georgila, K.: Using integer linear programming for detecting speech disfluencies. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 109–112 (2009)
Google Scholar
Gerczuk, M., Amiriparian, S., Ottl, S., Schuller, B.W.: EmoNet: a transfer learning framework for multi-corpus speech emotion recognition. IEEE Trans. Affect. Comput. (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howell, P., Davis, S., Bartrip, J.: The university college London archive of stuttered speech (UCLASS) (2009)
Google Scholar
Howell, P., Sackin, S., Glenn, K.: Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers. J. Speech Lang. Hear. Res. 40(5), 1073–1084 (1997)
Google Scholar
Jouaiti, M., Dautenhahn, K.: Dysfluency classification in stuttered speech using deep learning for real-time applications. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6482–6486 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746638
Kourkounakis, T., Hajavi, A., Etemad, A.: Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6089–6093. IEEE (2020)
Google Scholar
Kourkounakis, T., Hajavi, A., Etemad, A.: FluentNet: end-to-end detection of speech disfluency with deep learning. arXiv preprint arXiv:2009.11394 (2020)
Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., Stober, S.: Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290 (2017)
Latif, S., Rana, R., Younis, S., Qadir, J., Epps, J.: Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353 (2018)
Lea, C., Mitra, V., Joshi, A., Kajarekar, S., Bigham, J.P.: Sep-28k: a dataset for stuttering event detection from podcasts with people who stutter. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6798–6802. IEEE (2021)
Google Scholar
Mahesha, P., Vinod, D.S.: Classification of speech dysfluencies using speech parameterization techniques and multiclass SVM. In: Singh, K., Awasthi, A.K. (eds.) QShine 2013. LNICST, vol. 115, pp. 298–308. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37949-9_26
Chapter Google Scholar
Marcinek, L., Stone, M., Millman, R., Gaydecki, P.: N-MTTL SI model: non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification. In: Interspeech (2021)
Google Scholar
Matassoni, M., Gretter, R., Falavigna, D., Giuliani, D.: Non-native children speech recognition through transfer learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6229–6233. IEEE (2018)
Google Scholar
Oue, S., Marxer, R., Rudzicz, F.: Automatic dysfluency detection in dysarthric speech using deep belief networks. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, pp. 60–64 (2015)
Google Scholar
Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D.: Improved speech emotion recognition using transfer learning and spectrogram augmentation. In: Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 645–652 (2021)
Google Scholar
Ratner, N.B., MacWhinney, B.: Fluency bank: a new resource for fluency research and practice. J. Fluen. Disord. 56, 69–80 (2018)
Article Google Scholar
Ravikumar, K., Rajagopal, R., Nagaraj, H.: An approach for objective assessment of stuttered speech using MFCC. In: The International Congress for Global Science and Technology, p. 19 (2009)
Google Scholar
Ravikumar, K., Reddy, B., Rajagopal, R., Nagaraj, H.: Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies. Proc. World Acad. Sci. Eng. Technol. 36, 270–273 (2008)
Google Scholar
Santoso, J., Yamada, T., Makino, S.: Categorizing error causes related to utterance characteristics in speech recognition. Proc. NCSP 19, 514–517 (2019)
Google Scholar
Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: StutterNet: stuttering detection using time delay neural network. arXiv preprint arXiv:2105.05599 (2021)
Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: Machine learning for stuttering identification: review, challenges & future directions. arXiv preprint arXiv:2107.04057 (2021)
Shivakumar, P.G., Georgiou, P.: Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput. Speech Lang. 63, 101077 (2020)
Article Google Scholar
Suszyński, W., Kuniszyk-Jóźkowiak, W., Smołka, E., Dzieńkowski, M.: Prolongation detection with application of fuzzy logic. Ann. Universitatis Mariae Curie-Sklodowska Sectio AI-Informatica 1(1), 1–8 (2015)
Google Scholar
Szczurowska, I., Kuniszyk-Jóźkowiak, W., Smołka, E.: The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Arch. Acoust. 31(4(S)), 205–210 (2014)
Google Scholar
Villegas, B., Flores, K.M., Acuña, K.J., Pacheco-Barrios, K., Elias, D.: A novel stuttering disfluency classification system based on respiratory biosignals. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4660–4663. IEEE (2019)
Google Scholar
Wang, D., Zheng, T.F.: Transfer learning for speech and language processing. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1225–1237. IEEE (2015)
Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Article Google Scholar
Yildirim, S., Narayanan, S.: Automatic detection of disfluency boundaries in spontaneous speech of children using audio-visual information. IEEE Trans. Audio Speech Lang. Process. 17(1), 2–12 (2009)
Article Google Scholar
Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. arXiv preprint arXiv:1604.03209 (2016)

Download references

Acknowledgments

This research was undertaken, in part, thanks to funding from the Canada 150 Research Chairs Program.

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, University of Waterloo, 200 University Ave, Waterloo, ON, N2L3G1, Canada
Melanie Jouaiti & Kerstin Dautenhahn

Authors

Melanie Jouaiti
View author publications
You can also search for this author in PubMed Google Scholar
Kerstin Dautenhahn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Melanie Jouaiti .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jouaiti, M., Dautenhahn, K. (2022). Multi-label Dysfluency Classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_25
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics