Ist Stimme das neue Blut? KI und Stimmbiomarker zu früheren Diagnose – für jedermann, überall und jederzeit

Schuller, Dagmar M.; Schuller, Björn W.

doi:10.1007/978-3-658-33597-7_26

Dagmar M. Schuller² &
Björn W. Schuller³

16k Accesses

Zusammenfassung

Wenn ein Mensch spricht oder Laute von sich gibt, transportiert er neben dem Inhalt auch wesentliche andere Merkmale, die Rückschlüsse auf seine Eigenschaften und seinen Zustand zulassen. Wie jemand etwas gesagt hat, überträgt oftmals eine viel wesentlichere Botschaft als der Inhalt selbst. So können aus der Stimme neben Geschlecht, Alter, Dialekt auch Emotionszustände, Persönlichkeitsmerkmale, Sprachstörungen und insbesondere Hinweise auf Krankheiten erkannt werden. Seit Anfang der 2000er-Jahre haben sich der Wissenschaftsbereich der Computer Audition inklusive des Spoken Language Processing (SLP) und der Computational Paralinguistics (CP) zunehmend mit diesen Merkmalen beschäftigt. Die menschliche Laut- und Sprachproduktion ist ein komplexes System, bei welchem eine Vielzahl von Muskelgruppen und Organen beteiligt sind. Beeinträchtigungen einzelner oder mehrerer beteiligter Muskeln oder Organe stören die Produktion, was als Dysfunktionalität oder Anomalie im Audiosignal wahrgenommen werden kann. Ebenso komplex ist die Steuerung dieser Muskelgruppen durch das kognitive System, dessen Störung ebenfalls im Audiosignal „hörbar“ ist. Ferner wirken sich anatomische und physiologische Gegebenheiten auf die Klangprägung aus und sind entsprechend „erhörbar“. Durch den Einsatz von maschinellem Lernen, insbesondere tiefer neuronaler Netze und weiteren Verfahren maschinellen Lernens oder allgemeinerer künstlicher Intelligenz (KI) konnte so in den letzten Jahren eine zunehmend robustere Erkennungsleistung bei der Diagnose von Krankheiten und Symptomen aus den menschlichen Lauten und gesprochenen Sprache erzielt werden. Dieser Beitrag gibt einen kurzen Einblick in die Funktionsweise und zeigt die bereits bestehenden Möglichkeiten des Einsatzes der KI-basierten Audioanalyse für das Gesundheitswesen, insbesondere im Zusammenhang mit neurodegenerativen, neurokognitiven, neuroentwicklungsbezogenen und psychischen, aber auch respiratorischen Krankheiten auf und gibt einen Ausblick über die zukünftige Entwicklung.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Literatur

Abdelwahab M., & Busso C. (2019) Active learning for speech emotion recognition using deep neural network. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), (IEEE), 03.09.‒06.09.2019, Cambridge UK, S. 1–7. https://doi.org/10.1109/ACII.2019.8925524.
Cummins, N., Baird, A., & Schuller, B. W. (2018). Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. Methods, 151, 41–54
Google Scholar
Deng, J., Schuller, B., Eyben, F., Schuller, D., Zhang, Z., Francois, H., & Oh, E. (2020). Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration. Neural Computing and Applications, 32(4), 1095–1107.
Article Google Scholar
Eyben, F., Wöllmer, M., & Schuller, B. (2009). OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In 2009 3rd international conference on affective computing and intelligent interaction and workshops, Institute of Electrical and Electronics Engineers (IEEE), 10.09.‒12.09.2009, Amsterdam, Netherlands, S. 1–6. https://doi.org/10.1109/ACII.2009.5349350.
Ismail, M. A., Deshmukh, S., & Singh, R. (2020). Detection of COVID-19 through the analysis of vocal fold oscillations. arXiv preprint arXiv:2010.10707.
Johri, A., & Tripathi, A. (2019). Parkinson Disease Detection Using Deep Neural Networks. In 2019 Twelfth International Conference on Contemporary Computing (IC3) Institute of Electrical and Electronics Engineers (IEEE), 08.08.‒10.08.2019, Noida, India, S. 1–4. https://doi.org/10.1109/IC3.2019.8844941.
Kraus, M. W. (2017). Voice-only communication enhances empathic accuracy. American Psychologist, 72(7), 644.
Article Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444
Google Scholar
Oviatt, S., Coulston, R., & Lunsford, R. (2004). When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proceedings of the 6th international conference on Multimodal interfaces, international conference on Multimodal interfaces (ICMI) 2004, State College, PA, USA, 13.10.‒15.10.2004, S. 129–136.
Google Scholar
Picard, R. W. (2000). Affective computing. MIT press, Massachusetts Institute of Technology.
Book Google Scholar
Ren, Z., Han, J., Cummins, N., & Schuller, B. W. (2020). Enhancing transferability of black-box adversarial attacks via lifelong learning for speech emotion recognition models. In Proceedings Interspeech 2020, 25.10.‒29.10.2020, Shanghai, China, S. 496–500. https://doi.org/10.21437/Interspeech.2020-1869.
Ringeval, F., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., Amiriparian, S., Messner, E.-M., Song, S. Liu, S., Zhao, Z., Mallol-Ragnolta, A., Ren, Z., Soleymani, M., & Pantic, M. (2019). AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, Fabien Ringeval; ACM Special Interest Group on Multimedia, Association for Computing Machinery (ACM), Nice, France, 21.10.2019, S. 3–12. https://doi.org/10.1145/3347320.3357688.
Robinson, C., Obin, N., & Roebel, A. (2019). Sequence-to-sequence modelling of F0 for speech emotion conversion. In ICASSP 2019–2019 IEEE, International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), May 2019, Brighton, UK, S. 6830–6834. https://doi.org/10.1109/ICASSP.2019.8683865
Schuller B. W., Batliner A., Bergler C., Pokorny F., Krajewski J., Cychosz M., Vollmann R., Roelen S.-D., Schnieder S., Bergelson E., Cristià A., Seidl A., Yankowitz L., Nöth E., Amiriparian S., Hantke S., & Schmitt M. (2019) “The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity,” in Proceedings INTERSPEECH 2019, 20th Annual Conference of the International Speech Communication Association, (Graz, Austria), pp. 2378–2382, ISCA, ISCA, September 2019. (acceptance rate: 49.3 %)
Google Scholar
Schuller, B., Steidl, S., & Batliner, A. (2009). The interspeech 2009 emotion challenge. In Tenth Annual Conference of the International Speech Communication Association (ISCA) 2009, 06.09.‒10.09.2009, Brighton, UK, S. 312–315.
Google Scholar
Schuller, B. W., Schuller, D. M., Qian, K., Liu, J., Zheng, H., & Li, X. (2020). Covid-19 and computer audition: An overview on what speech & sound analysis could contribute in the SARS-CoV-2 Corona crisis. arXiv preprint arXiv:2003.11117.
Schuller, D., & Schuller, B. (2018). The age of artificial emotional intelligence. Institute of Electrical and Electronics Engineers (IEEE) Computer Magazine, 51(9), 38–46.
Google Scholar
Schuller, D. M., & Schuller, B. W. (2020). A review on five recent and near-future developments in computational processing of emotion in the human voice. Emotion Review. https://doi.org/10.1177/1754073919898526.
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In The 41st IEEE international conference on acoustics, speech and signal processing (ICASSP) 2016, Institute of Electrical and Electronics Engineers (IEEE), 20.03.‒25.03.2016, Shanghai, China, S. 5200–5204.
Google Scholar
Wagner, J., André, E., & Jung, F. (2009). Smart sensor integration: A framework for multimodal emotion recognition in real-time. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, Institute of Electrical and Electronics Engineers (IEEE), 10.09.‒12.09.2009, Amsterdam, Netherlands, S. 1–8.
Google Scholar
Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., & Cowie, R. (2008). Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of the 9th Interspeech 2008 incorp. 12th Australasian International Conference on Speech Science and Technology , Speech Science and Technology (SST) 2008, 22.09.‒26.09.2008, Brisbane, Australia, S. 597–600.
Google Scholar
Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., & Schuller, B. (2019). Speech emotion classification using attention-based LSTM. Institute of Electrical and Electronics Engineers (IEEE)/Association for Computing Machinery (ACM) Transactions on Audio, Speech, and Language Processing, 27(11), 1675–1685. https://doi.org/10.1109/TASLP.2019.2925934.
Zhang, Z., Han, J., Qian, K., & Schuller, B. W. (2018). Evolving learning for analysing mood-related infant vocalisation. In Interspeech, 02.09.‒06.09.2018, Hyderabad, India, S. 142–146.
Google Scholar
Zhang, J.-j., Dong, X., Cao, Y.-y., Yuan, Y.-d., Yang, Y.-b., Yan, Y.-q., Akdis, C. A., & Gao, Y.-d. (2020). Clinical characteristics of 140 patients infected with SARS‐CoV‐2 in Wuhan. China. Allergy, 75, 1730–1741.
Google Scholar

Download references

Author information

Authors and Affiliations

audEERING GmbH, Gilching, Deutschland
Dagmar M. Schuller
Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing, Universität Augsburg, Augsburg, Deutschland
Björn W. Schuller

Authors

Dagmar M. Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Björn W. Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dagmar M. Schuller .

Editor information

Editors and Affiliations

Hochschule für angewandte Wissenschaften Neu-Ulm, Fakultät Gesundheitsmanagement, Neu-Ulm, Deutschland
Mario A. Pfannstiel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schuller, D.M., Schuller, B.W. (2022). Ist Stimme das neue Blut? KI und Stimmbiomarker zu früheren Diagnose – für jedermann, überall und jederzeit. In: Pfannstiel, M.A. (eds) Künstliche Intelligenz im Gesundheitswesen. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-33597-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-658-33597-7_26
Published: 17 March 2022
Publisher Name: Springer Gabler, Wiesbaden
Print ISBN: 978-3-658-33596-0
Online ISBN: 978-3-658-33597-7
eBook Packages: Business and Economics (German Language)

Publish with us

Policies and ethics