Skip to main content

Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2023)

Abstract

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works [2, 6] collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved \(96.5\%\) accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

Partly supported by FAPESP grants 2020/16543-7 and 2020/06443-5, and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Carried out at the Center for Artificial Intelligence (C4AI-USP), supported by FAPESP grant 2019/07665-4 and by the IBM Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Initial tests attain above \(95\%\) accuracy (above 0.93 F1-score) when training and testing on P2 data in all 4 networks. So P2 is not harder, it is only different.

  2. 2.

    “O amor ao próximo ajuda a enfrentar essa fase com a força que a gente precisa”.

  3. 3.

    Performance difference by resampling the audios is minimal.

  4. 4.

    Again, we use 20 epochs, batch size 16, learning rate \(10^{-4}\) and best models are saved.

  5. 5.

    ‘O’ (Other) and ‘CM’ represent controls. The other hospitals refer only to patients.

  6. 6.

    Other angles do not add much. Using the PANNs yields similar results.

References

  1. Aluísio, S.M., Camargo Neto, A.C.d, et al.: Detecting respiratory insufficiency via voice analysis: the SPIRA project. In: Practical Machine Learning for Developing Countries at ICLR 2022. Proceeding. ICLR (2022)

    Google Scholar 

  2. Casanova, E., Gris, L., et al.: Deep learning against COVID-19: respiratory insufficiency detection in Brazilian Portuguese speech. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 625–633. ACL, August 2021

    Google Scholar 

  3. Devlin, J., Chang, M.W., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Fernandes-Svartman, F., Berti, L., et al.: Temporal prosodic cues for COVID-19 in Brazilian Portuguese speakers. In: Proceedings of Speech Prosody 2022, pp. 210–214 (2022)

    Google Scholar 

  5. Gauy, M., Finger, M.: Acoustic models for Brazilian Portuguese speech based on neural transformers (2023, submitted for publication)

    Google Scholar 

  6. Gauy, M.M., Finger, M.: Audio MFCC-gram transformers for respiratory insufficiency detection in COVID-19. In: STIL 2021, November 2021

    Google Scholar 

  7. Gauy, M.M., Finger, M.: Pretrained audio neural networks for speech emotion recognition in Portuguese. In: Automatic Speech Recognition for Spontaneous and Prepared Speech Speech Emotion Recognition in Portuguese. CEUR-WS (2022)

    Google Scholar 

  8. Gemmeke, J.F., Ellis, D.P., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)

    Google Scholar 

  9. Gong, Y., Lai, C.I., et al.: SSAST: self-supervised audio spectrogram transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10699–10709 (2022)

    Google Scholar 

  10. Khan, S., Naseer, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54(10s) (2022)

    Google Scholar 

  11. Kong, Q., Cao, Y., et al.: PANNs: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2880–2894 (2020)

    Article  Google Scholar 

  12. Liu, A.T., Yang, S.W, et al.: Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6419–6423. IEEE (2020)

    Google Scholar 

  13. Robotti, C., Costantini, G., et al.: Machine learning-based voice assessment for the detection of positive and recovered COVID-19 patients. J. Voice (2021)

    Google Scholar 

  14. da Silva, D.P.P., Casanova, E., et al.: Interpretability analysis of deep models for COVID-19 detection. arXiv preprint arXiv:2211.14372 (2022)

  15. Vaswani, A., Shazeer, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Matheus Gauy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gauy, M.M. et al. (2023). Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34344-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34343-8

  • Online ISBN: 978-3-031-34344-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics