research-article

Personalised Speech-Based Heart Rate Categorisation Using Weighted-Instance Learning

Authors:

Alexander Kathan,

Shahin Amiriparian,

Alexander Gebhard,

Andreas Triantafyllopoulos,

Maurice Gerczuk,

Björn W. SchullerAuthors Info & Claims

MMSports '23: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports

Pages 9 - 13

https://doi.org/10.1145/3606038.3616155

Published: 29 October 2023 Publication History

Abstract

Running as one of the most popular sports comes with many positive effects, but also with risks. Most injuries are caused by overexertion. To optimise training and prevent injuries, approaches are needed to easily monitor training behaviour. Previous research has shown that heart rate (HR) can be automatically classified using speech data. Real-world applications pose challenges due to the heterogeneity of individuals, which is why we introduce a personalised HR classification in this work. In particular, we first determine runners in the train set with similar acoustic patterns (x-vectors) compared to a runner in the test set. Further, we extract deep representations and hand-crafted features from the input data. Subsequently, using the computed similarity values, we adapt a Support Vector Machine (SVM) for each individual. In this context, we choose the runners with the lowest Euclidean distances and weight their train samples more heavily during the training process of the SVM. Our personalised approach yields a best relative improvement of 20.8% compared to a non-personalised model in a 5-class HR classification task. The obtained results demonstrate the effectiveness of our approach, paving the way for real-world, personalised applications.

References

[1]

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, and Björn Schuller. 2017. Snore sound classification using image-based deep spectrum features. (2017), 1--5.

[2]

Shahin Amiriparian, Tobias Hübner, Vincent Karas, Maurice Gerczuk, Sandra Ottl, and Björn W Schuller. 2022. Deepspectrumlite: A power-efficient transfer learning framework for embedded speech and audio processing from decentralized data. Frontiers in Artificial Intelligence, Vol. 5 (2022), 1--10.

[3]

Shahin Amiriparian and Björn Schuller. 2022. Ai hears your health: Computer audition for health monitoring. In Proc. ICT for Health, Accessibility and Wellbeing. Springer, Larnaca, Cyprus, 227--233.

[4]

Dwaipayan Biswas, Neide Sim oes-Capela, Chris Van Hoof, and Nick Van Helleputte. 2019. Heart rate estimation from wrist-worn photoplethysmography: A review. IEEE Sensors Journal, Vol. 19, 16 (2019), 6560--6570.

[5]

Moritz Einfalt, Charles Dampeyrou, Dan Zecha, and Rainer Lienhart. 2019. Frame-level event detection in athletics videos with pose-based convolutional sequence networks. In Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports. 42--50.

Digital Library

[6]

Moritz Einfalt, Dan Zecha, and Rainer Lienhart. 2018. Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 446--455.

[7]

Bjoern Eskofier, Patrick Kugler, Daniel Melzer, and Pascal Kuehner. 2012. Embedded classification of the perceived fatigue state of runners: Towards a body sensor network for assessing the fatigue state during running. In 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks. IEEE, 113--117.

Digital Library

[8]

Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing, Vol. 7, 2 (2015), 190--202.

[9]

Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proc. ACM International Conference on Multimedia. ACM, Ottawa, Canada, 1459--1462.

Digital Library

[10]

Alexander Gebhard, Shahin Amiriparian, Andreas Triantafyllopoulos, Alexander Kathan, Maurice Gerczuk, Sandra Ottl, Valerie Dieter, Mirko Jaumann, David Hildner, Patrick Schneeweiss, et al. 2022. Towards Heart Rate Categorisation from Speech in Outdoor Running Conditions. In Proc. EHB. IEEE, Ia?i, Romania.

[11]

Mohsen Gholami, Christopher Napier, Astrid Garc'ia Pati no, Tyler J Cuthbert, and Carlo Menon. 2020. Fatigue monitoring in running using flexible textile wearable sensors. Sensors, Vol. 20, 19 (2020), 5573.

[12]

Juha Karvonen and Timo Vuorimaa. 1988. Heart rate and exercise intensity during sports activities: practical application. Sports medicine, Vol. 5 (1988), 303--311.

[13]

Alexander Kathan, Shahin Amiriparian, Lukas Christ, Andreas Triantafyllopoulos, Niklas Müller, Andreas König, and Björn W Schuller. 2022a. A personalised approach to audiovisual humour recognition and its individual-level fairness. In Proc. Multimodal Sentiment Analysis Workshop and Challenge (MuSe). ACM, Lisbon, Portugal, 29--36.

Digital Library

[14]

Alexander Kathan, Mathias Harrer, Ludwig Küster, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Maurice Gerczuk, Tianhao Yan, Srividya Tirunellai Rajamani, Elena Heber, Inga Grossmann, David D. Ebert, and Björn W. Schuller. 2022b. Personalised depression forecasting using mobile sensor data and ecological momentary assessment. Frontiers in Digital Health, Vol. 4 (2022), 964582.

[15]

Alexander Kathan, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Gebhard, Sandra Ottl, Maurice Gerczuk, Mirko Jaumann, David Hildner, Valerie Dieter, Patrick Schneeweiss, et al. 2022c. Investigating Individual-and Group-Level Model Adaptation for Self-Reported Runner Exertion Prediction from Biomechanics. In Proc. EHB. IEEE, Ia?i, Romania, 1--4.

[16]

Taha Khan, Lina E Lundgren, Eric J"arpe, M Charlotte Olsson, and Pelle Viberg. 2019. A novel method for classification of running fatigue using change-point segmentation. Sensors, Vol. 19, 21 (2019), 4729.

[17]

Boning Li and Akane Sano. 2020. Early versus late modality fusion of deep wearable sensor features for personalized prediction of tomorrow's mood, health, and stress. In Proc. EMBC. IEEE, Virtual Conference, 5896--5899.

[18]

Rasmus Oestergaard Nielsen, Ida Buist, Henrik Sørensen, Martin Lind, and Sten Rasmussen. 2012. Training errors and running related injuries: a systematic review. International journal of sports physical therapy, Vol. 7, 1 (2012), 58.

[19]

Tim Op De Beéck, Wannes Meert, Kurt Schütte, Benedicte Vanwanseele, and Jesse Davis. 2018. Fatigue prediction in outdoor runners via machine learning and sensor fusion. In Proc. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, London, UK, 606--615.

[20]

Sudha Ramasamy and Archana Balan. 2018. Wearable sensors for ECG measurement: a review. Sensor Review, Vol. 38, 4 (2018), 412--419.

[21]

Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, et al. 2021. SpeechBrain: A general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021), 1--34.

[22]

Bruno Tirotti Saragiotto, Tiê Parma Yamato, Luiz Carlos Hespanhol Junior, Michael J Rainbow, Irene S Davis, and Alexandre Dias Lopes. 2014. What are the main risk factors for running-related injuries? Sports medicine, Vol. 44 (2014), 1153--1163.

[23]

Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, et al. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proc. INTERSPEECH. ISCA, Lyon, France, 1--5.

[24]

David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018a. Spoken language recognition using x-vectors. In Odyssey, Vol. 2018. 105--111.

[25]

David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018b. X-vectors: Robust dnn embeddings for speaker recognition. In Proc. ICASSP. IEEE, Calgary, Canada, 5329--5333.

Digital Library

[26]

Kusha Sridhar and Carlos Busso. 2022. Unsupervised personalization of an emotion recognition system: The unique properties of the externalization of valence in speech. IEEE Transactions on Affective Computing, Vol. 13, 4 (2022), 1959--1972.

[27]

Sara Taylor, Natasha Jaques, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard. 2017. Personalized multitask learning for predicting tomorrow's mood, stress, and health. IEEE Transactions on Affective Computing, Vol. 11, 2 (2017), 200--213.

[28]

Andreas Triantafyllopoulos, Shuo Liu, and Björn W Schuller. 2021. Deep speaker conditioning for speech emotion recognition. In Proc. IEEE International Conference on Multimedia and Expo (ICME). IEEE, Virtual Conference, 1--6.

[29]

Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, et al. 2022. Fatigue prediction in outdoor running conditions using audio data. In Proc. EMBC. IEEE, Glasgow, UK, 2623--2626.

[30]

Jürgen Trouvain and Khiet P Truong. 2015. Prosodic characteristics of read speech before and after treadmill running. In Proc. INTERSPEECH. ISCA, Dresden, Germany, 1--5.

[31]

Khiet P Truong, Arne Nieuwenhuys, Peter Beek, and Vanessa Evers. 2015. A database for analysis of speech under physical stress: Detection of exercise intensity while running and talking. In Proc. INTERSPEECH. ISCA, Dresden, Germany, 3705--3709.

[32]

Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Florian Eyben, and Björn W Schuller. 2022. Dawn of the transformer era in speech emotion recognition: closing the valence gap. arXiv preprint arXiv:2203.07378 (2022), 1--25.

[33]

Dan Zecha, Moritz Einfalt, Christian Eggert, and Rainer Lienhart. 2018. Kinematic pose rectification for performance analysis and retrieval in sports. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, USA, 1791--1799.

[34]

Dan Zecha, Moritz Einfalt, and Rainer Lienhart. 2019. Refining joint locations for human pose tracking in sports videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0--0.

Cited By

Kathan AAmiriparian SChrist LEulitz SSchuller B(2024)Automatic Speech-Based Charisma Recognition and the Impact of Integrating Auxiliary Characteristics2024 IEEE Conference on Telepresence10.1109/Telepresence63209.2024.10841640(148-153)Online publication date: 16-Nov-2024
https://doi.org/10.1109/Telepresence63209.2024.10841640
Kathan AAmiriparian STriantafyllopoulos AGebhard AMilkus SHohmann JMuderlak PSchottdorf JMusil RSchuller B(2024)Personalised Speech-Based PTSD Prediction Using Weighted-Instance Learning2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)10.1109/EMBC53108.2024.10782220(1-4)Online publication date: 15-Jul-2024
https://doi.org/10.1109/EMBC53108.2024.10782220

Index Terms

Personalised Speech-Based Heart Rate Categorisation Using Weighted-Instance Learning
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Multiple-Instance Active Learning for Image Categorization
MMM '09: Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling

Both multiple-instance learning and active learning are widely employed in image categorization, but generally they are applied separately. This paper studies the integration of these two methods. Different from typical active learning approaches, the ...
Learning Instance Weighted Naive Bayes from labeled and unlabeled data

In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSports '23: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports

October 2023

174 pages

ISBN:9798400702693

DOI:10.1145/3606038

Program Chairs:
Rainer Lienhart
University of Augsburg
,
Thomas B. Moeslund
Aalborg University
,
Hideo Saito
Keio University

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft
Zentrales Innovationsprogramm Mittelstand (ZIM)

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 29 of 49 submissions, 59%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
81
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kathan AAmiriparian SChrist LEulitz SSchuller B(2024)Automatic Speech-Based Charisma Recognition and the Impact of Integrating Auxiliary Characteristics2024 IEEE Conference on Telepresence10.1109/Telepresence63209.2024.10841640(148-153)Online publication date: 16-Nov-2024
https://doi.org/10.1109/Telepresence63209.2024.10841640
Kathan AAmiriparian STriantafyllopoulos AGebhard AMilkus SHohmann JMuderlak PSchottdorf JMusil RSchuller B(2024)Personalised Speech-Based PTSD Prediction Using Weighted-Instance Learning2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)10.1109/EMBC53108.2024.10782220(1-4)Online publication date: 15-Jul-2024
https://doi.org/10.1109/EMBC53108.2024.10782220

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten