Skip to main content

Multi-label Dysfluency Classification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

  • 1096 Accesses

Abstract

Stuttering is a neuro-developmental disorder represented in \(1\%\) of the population. Dysfluency classification is still an open research question, with concerns of which feature representation or which classifier to use. Another issue, which has been neglected so far, is how to deal with audio samples that contain more than one type of dysfluency. Research has mostly preferred considering only single-labels problems, in part due to the lack of substantial multi-labels datasets. However, the FluencyBank and SEP-28K datasets are now available and contain multi-label data, which should pave the way for more research taking this aspect into account.

In this paper, we give an overview of different ways to handle multi-label classification and compare them, while fine-tuning the ResNet50 network to perform multi-label dysfluency classification. We show that, fine-tuning the ResNet50, independently of the label representation, performs better than current state of the art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aravind, P., Nechiyil, U., Paramparambath, N., et al.: Audio spoofing verification using deep convolutional neural networks by transfer learning. arXiv preprint arXiv:2008.03464 (2020)

  2. Chee, L.S., Ai, O.C., Hariharan, M., Yaacob, S.: MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. In: 2009 IEEE Student Conference on Research and Development (SCOReD), pp. 146–149. IEEE (2009)

    Google Scholar 

  3. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516. IEEE (2013)

    Google Scholar 

  4. Geetha, Y., Pratibha, K., Ashok, R., Ravindra, S.K.: Classification of childhood disfluencies using neural networks. J. Fluen. Disord. 25(2), 99–117 (2000)

    Article  Google Scholar 

  5. Georgila, K.: Using integer linear programming for detecting speech disfluencies. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 109–112 (2009)

    Google Scholar 

  6. Gerczuk, M., Amiriparian, S., Ottl, S., Schuller, B.W.: EmoNet: a transfer learning framework for multi-corpus speech emotion recognition. IEEE Trans. Affect. Comput. (2021)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Howell, P., Davis, S., Bartrip, J.: The university college London archive of stuttered speech (UCLASS) (2009)

    Google Scholar 

  9. Howell, P., Sackin, S., Glenn, K.: Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers. J. Speech Lang. Hear. Res. 40(5), 1073–1084 (1997)

    Google Scholar 

  10. Jouaiti, M., Dautenhahn, K.: Dysfluency classification in stuttered speech using deep learning for real-time applications. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6482–6486 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746638

  11. Kourkounakis, T., Hajavi, A., Etemad, A.: Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6089–6093. IEEE (2020)

    Google Scholar 

  12. Kourkounakis, T., Hajavi, A., Etemad, A.: FluentNet: end-to-end detection of speech disfluency with deep learning. arXiv preprint arXiv:2009.11394 (2020)

  13. Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., Stober, S.: Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290 (2017)

  14. Latif, S., Rana, R., Younis, S., Qadir, J., Epps, J.: Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353 (2018)

  15. Lea, C., Mitra, V., Joshi, A., Kajarekar, S., Bigham, J.P.: Sep-28k: a dataset for stuttering event detection from podcasts with people who stutter. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6798–6802. IEEE (2021)

    Google Scholar 

  16. Mahesha, P., Vinod, D.S.: Classification of speech dysfluencies using speech parameterization techniques and multiclass SVM. In: Singh, K., Awasthi, A.K. (eds.) QShine 2013. LNICST, vol. 115, pp. 298–308. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37949-9_26

    Chapter  Google Scholar 

  17. Marcinek, L., Stone, M., Millman, R., Gaydecki, P.: N-MTTL SI model: non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification. In: Interspeech (2021)

    Google Scholar 

  18. Matassoni, M., Gretter, R., Falavigna, D., Giuliani, D.: Non-native children speech recognition through transfer learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6229–6233. IEEE (2018)

    Google Scholar 

  19. Oue, S., Marxer, R., Rudzicz, F.: Automatic dysfluency detection in dysarthric speech using deep belief networks. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, pp. 60–64 (2015)

    Google Scholar 

  20. Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D.: Improved speech emotion recognition using transfer learning and spectrogram augmentation. In: Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 645–652 (2021)

    Google Scholar 

  21. Ratner, N.B., MacWhinney, B.: Fluency bank: a new resource for fluency research and practice. J. Fluen. Disord. 56, 69–80 (2018)

    Article  Google Scholar 

  22. Ravikumar, K., Rajagopal, R., Nagaraj, H.: An approach for objective assessment of stuttered speech using MFCC. In: The International Congress for Global Science and Technology, p. 19 (2009)

    Google Scholar 

  23. Ravikumar, K., Reddy, B., Rajagopal, R., Nagaraj, H.: Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies. Proc. World Acad. Sci. Eng. Technol. 36, 270–273 (2008)

    Google Scholar 

  24. Santoso, J., Yamada, T., Makino, S.: Categorizing error causes related to utterance characteristics in speech recognition. Proc. NCSP 19, 514–517 (2019)

    Google Scholar 

  25. Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: StutterNet: stuttering detection using time delay neural network. arXiv preprint arXiv:2105.05599 (2021)

  26. Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: Machine learning for stuttering identification: review, challenges & future directions. arXiv preprint arXiv:2107.04057 (2021)

  27. Shivakumar, P.G., Georgiou, P.: Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput. Speech Lang. 63, 101077 (2020)

    Article  Google Scholar 

  28. Suszyński, W., Kuniszyk-Jóźkowiak, W., Smołka, E., Dzieńkowski, M.: Prolongation detection with application of fuzzy logic. Ann. Universitatis Mariae Curie-Sklodowska Sectio AI-Informatica 1(1), 1–8 (2015)

    Google Scholar 

  29. Szczurowska, I., Kuniszyk-Jóźkowiak, W., Smołka, E.: The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Arch. Acoust. 31(4(S)), 205–210 (2014)

    Google Scholar 

  30. Villegas, B., Flores, K.M., Acuña, K.J., Pacheco-Barrios, K., Elias, D.: A novel stuttering disfluency classification system based on respiratory biosignals. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4660–4663. IEEE (2019)

    Google Scholar 

  31. Wang, D., Zheng, T.F.: Transfer learning for speech and language processing. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1225–1237. IEEE (2015)

    Google Scholar 

  32. Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6

    Article  Google Scholar 

  33. Yildirim, S., Narayanan, S.: Automatic detection of disfluency boundaries in spontaneous speech of children using audio-visual information. IEEE Trans. Audio Speech Lang. Process. 17(1), 2–12 (2009)

    Article  Google Scholar 

  34. Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. arXiv preprint arXiv:1604.03209 (2016)

Download references

Acknowledgments

This research was undertaken, in part, thanks to funding from the Canada 150 Research Chairs Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Melanie Jouaiti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jouaiti, M., Dautenhahn, K. (2022). Multi-label Dysfluency Classification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics