Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese

Gauy, Marcelo Matheus; Berti, Larissa Cristina; Cândido, Arnaldo; Neto, Augusto Camargo; Goldman, Alfredo; Levin, Anna Sara Shafferman; Martins, Marcus; de Medeiros, Beatriz Raposo; Queiroz, Marcelo; Sabino, Ester Cerdeira; Svartman, Flaviane Romani Fernandes; Finger, Marcelo

doi:10.1007/978-3-031-34344-5_32

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13897))

Included in the following conference series:

International Conference on Artificial Intelligence in Medicine

1098 Accesses
8 Altmetric

Abstract

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works [2, 6] collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved \(96.5\%\) accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

Partly supported by FAPESP grants 2020/16543-7 and 2020/06443-5, and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Carried out at the Center for Artificial Intelligence (C4AI-USP), supported by FAPESP grant 2019/07665-4 and by the IBM Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Initial tests attain above \(95\%\) accuracy (above 0.93 F1-score) when training and testing on P2 data in all 4 networks. So P2 is not harder, it is only different.
2.
“O amor ao próximo ajuda a enfrentar essa fase com a força que a gente precisa”.
3.
Performance difference by resampling the audios is minimal.
4.
Again, we use 20 epochs, batch size 16, learning rate \(10^{-4}\) and best models are saved.
5.
‘O’ (Other) and ‘CM’ represent controls. The other hospitals refer only to patients.
6.
Other angles do not add much. Using the PANNs yields similar results.

References

Aluísio, S.M., Camargo Neto, A.C.d, et al.: Detecting respiratory insufficiency via voice analysis: the SPIRA project. In: Practical Machine Learning for Developing Countries at ICLR 2022. Proceeding. ICLR (2022)
Google Scholar
Casanova, E., Gris, L., et al.: Deep learning against COVID-19: respiratory insufficiency detection in Brazilian Portuguese speech. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 625–633. ACL, August 2021
Google Scholar
Devlin, J., Chang, M.W., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fernandes-Svartman, F., Berti, L., et al.: Temporal prosodic cues for COVID-19 in Brazilian Portuguese speakers. In: Proceedings of Speech Prosody 2022, pp. 210–214 (2022)
Google Scholar
Gauy, M., Finger, M.: Acoustic models for Brazilian Portuguese speech based on neural transformers (2023, submitted for publication)
Google Scholar
Gauy, M.M., Finger, M.: Audio MFCC-gram transformers for respiratory insufficiency detection in COVID-19. In: STIL 2021, November 2021
Google Scholar
Gauy, M.M., Finger, M.: Pretrained audio neural networks for speech emotion recognition in Portuguese. In: Automatic Speech Recognition for Spontaneous and Prepared Speech Speech Emotion Recognition in Portuguese. CEUR-WS (2022)
Google Scholar
Gemmeke, J.F., Ellis, D.P., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)
Google Scholar
Gong, Y., Lai, C.I., et al.: SSAST: self-supervised audio spectrogram transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10699–10709 (2022)
Google Scholar
Khan, S., Naseer, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54(10s) (2022)
Google Scholar
Kong, Q., Cao, Y., et al.: PANNs: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2880–2894 (2020)
Article Google Scholar
Liu, A.T., Yang, S.W, et al.: Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6419–6423. IEEE (2020)
Google Scholar
Robotti, C., Costantini, G., et al.: Machine learning-based voice assessment for the detection of positive and recovered COVID-19 patients. J. Voice (2021)
Google Scholar
da Silva, D.P.P., Casanova, E., et al.: Interpretability analysis of deep models for COVID-19 detection. arXiv preprint arXiv:2211.14372 (2022)
Vaswani, A., Shazeer, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade de São Paulo, Butanta, São Paulo, SP, Brazil
Marcelo Matheus Gauy, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman & Marcelo Finger
Universidade Estadual Paulista, Marília, SP, Brazil
Larissa Cristina Berti
Universidade Estadual Paulista, São José do Rio Preto, SP, Brazil
Arnaldo Cândido Jr

Authors

Marcelo Matheus Gauy
View author publications
You can also search for this author in PubMed Google Scholar
Larissa Cristina Berti
View author publications
You can also search for this author in PubMed Google Scholar
Arnaldo Cândido Jr
View author publications
You can also search for this author in PubMed Google Scholar
Augusto Camargo Neto
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Goldman
View author publications
You can also search for this author in PubMed Google Scholar
Anna Sara Shafferman Levin
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Martins
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Raposo de Medeiros
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Queiroz
View author publications
You can also search for this author in PubMed Google Scholar
Ester Cerdeira Sabino
View author publications
You can also search for this author in PubMed Google Scholar
Flaviane Romani Fernandes Svartman
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Finger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Matheus Gauy .

Editor information

Editors and Affiliations

University of Murcia, Murcia, Spain
Jose M. Juarez
Universitat Jaume I, Castellón de la Plana, Spain
Mar Marcos
University of Maribor, Maribor, Slovenia
Gregor Stiglic
Brunel University London, Uxbridge, UK
Allan Tucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gauy, M.M. et al. (2023). Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-34344-5_32
Published: 05 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34343-8
Online ISBN: 978-3-031-34344-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese