ABSTRACT
Running as one of the most popular sports comes with many positive effects, but also with risks. Most injuries are caused by overexertion. To optimise training and prevent injuries, approaches are needed to easily monitor training behaviour. Previous research has shown that heart rate (HR) can be automatically classified using speech data. Real-world applications pose challenges due to the heterogeneity of individuals, which is why we introduce a personalised HR classification in this work. In particular, we first determine runners in the train set with similar acoustic patterns (x-vectors) compared to a runner in the test set. Further, we extract deep representations and hand-crafted features from the input data. Subsequently, using the computed similarity values, we adapt a Support Vector Machine (SVM) for each individual. In this context, we choose the runners with the lowest Euclidean distances and weight their train samples more heavily during the training process of the SVM. Our personalised approach yields a best relative improvement of 20.8% compared to a non-personalised model in a 5-class HR classification task. The obtained results demonstrate the effectiveness of our approach, paving the way for real-world, personalised applications.
- Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, and Björn Schuller. 2017. Snore sound classification using image-based deep spectrum features. (2017), 1--5.Google Scholar
- Shahin Amiriparian, Tobias Hübner, Vincent Karas, Maurice Gerczuk, Sandra Ottl, and Björn W Schuller. 2022. Deepspectrumlite: A power-efficient transfer learning framework for embedded speech and audio processing from decentralized data. Frontiers in Artificial Intelligence , Vol. 5 (2022), 1--10.Google ScholarCross Ref
- Shahin Amiriparian and Björn Schuller. 2022. Ai hears your health: Computer audition for health monitoring. In Proc. ICT for Health, Accessibility and Wellbeing. Springer, Larnaca, Cyprus, 227--233.Google Scholar
- Dwaipayan Biswas, Neide Sim oes-Capela, Chris Van Hoof, and Nick Van Helleputte. 2019. Heart rate estimation from wrist-worn photoplethysmography: A review. IEEE Sensors Journal, Vol. 19, 16 (2019), 6560--6570.Google ScholarCross Ref
- Moritz Einfalt, Charles Dampeyrou, Dan Zecha, and Rainer Lienhart. 2019. Frame-level event detection in athletics videos with pose-based convolutional sequence networks. In Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports. 42--50.Google ScholarDigital Library
- Moritz Einfalt, Dan Zecha, and Rainer Lienhart. 2018. Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 446--455.Google Scholar
- Bjoern Eskofier, Patrick Kugler, Daniel Melzer, and Pascal Kuehner. 2012. Embedded classification of the perceived fatigue state of runners: Towards a body sensor network for assessing the fatigue state during running. In 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks. IEEE, 113--117.Google ScholarDigital Library
- Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing , Vol. 7, 2 (2015), 190--202.Google Scholar
- Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proc. ACM International Conference on Multimedia. ACM, Ottawa, Canada, 1459--1462.Google ScholarDigital Library
- Alexander Gebhard, Shahin Amiriparian, Andreas Triantafyllopoulos, Alexander Kathan, Maurice Gerczuk, Sandra Ottl, Valerie Dieter, Mirko Jaumann, David Hildner, Patrick Schneeweiss, et al. 2022. Towards Heart Rate Categorisation from Speech in Outdoor Running Conditions. In Proc. EHB. IEEE, Ia?i, Romania.Google ScholarCross Ref
- Mohsen Gholami, Christopher Napier, Astrid Garc'ia Pati no, Tyler J Cuthbert, and Carlo Menon. 2020. Fatigue monitoring in running using flexible textile wearable sensors. Sensors, Vol. 20, 19 (2020), 5573.Google ScholarCross Ref
- Juha Karvonen and Timo Vuorimaa. 1988. Heart rate and exercise intensity during sports activities: practical application. Sports medicine , Vol. 5 (1988), 303--311.Google ScholarCross Ref
- Alexander Kathan, Shahin Amiriparian, Lukas Christ, Andreas Triantafyllopoulos, Niklas Müller, Andreas König, and Björn W Schuller. 2022a. A personalised approach to audiovisual humour recognition and its individual-level fairness. In Proc. Multimodal Sentiment Analysis Workshop and Challenge (MuSe). ACM, Lisbon, Portugal, 29--36.Google ScholarDigital Library
- Alexander Kathan, Mathias Harrer, Ludwig Küster, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Maurice Gerczuk, Tianhao Yan, Srividya Tirunellai Rajamani, Elena Heber, Inga Grossmann, David D. Ebert, and Björn W. Schuller. 2022b. Personalised depression forecasting using mobile sensor data and ecological momentary assessment. Frontiers in Digital Health , Vol. 4 (2022), 964582.Google ScholarCross Ref
- Alexander Kathan, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Gebhard, Sandra Ottl, Maurice Gerczuk, Mirko Jaumann, David Hildner, Valerie Dieter, Patrick Schneeweiss, et al. 2022c. Investigating Individual-and Group-Level Model Adaptation for Self-Reported Runner Exertion Prediction from Biomechanics. In Proc. EHB. IEEE, Ia?i, Romania, 1--4.Google ScholarCross Ref
- Taha Khan, Lina E Lundgren, Eric J"arpe, M Charlotte Olsson, and Pelle Viberg. 2019. A novel method for classification of running fatigue using change-point segmentation. Sensors, Vol. 19, 21 (2019), 4729.Google ScholarCross Ref
- Boning Li and Akane Sano. 2020. Early versus late modality fusion of deep wearable sensor features for personalized prediction of tomorrow's mood, health, and stress. In Proc. EMBC. IEEE, Virtual Conference, 5896--5899.Google ScholarCross Ref
- Rasmus Oestergaard Nielsen, Ida Buist, Henrik Sørensen, Martin Lind, and Sten Rasmussen. 2012. Training errors and running related injuries: a systematic review. International journal of sports physical therapy, Vol. 7, 1 (2012), 58.Google Scholar
- Tim Op De Beéck, Wannes Meert, Kurt Schütte, Benedicte Vanwanseele, and Jesse Davis. 2018. Fatigue prediction in outdoor runners via machine learning and sensor fusion. In Proc. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, London, UK, 606--615.Google Scholar
- Sudha Ramasamy and Archana Balan. 2018. Wearable sensors for ECG measurement: a review. Sensor Review, Vol. 38, 4 (2018), 412--419.Google ScholarCross Ref
- Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, et al. 2021. SpeechBrain: A general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021), 1--34.Google Scholar
- Bruno Tirotti Saragiotto, Tiê Parma Yamato, Luiz Carlos Hespanhol Junior, Michael J Rainbow, Irene S Davis, and Alexandre Dias Lopes. 2014. What are the main risk factors for running-related injuries? Sports medicine , Vol. 44 (2014), 1153--1163.Google Scholar
- Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, et al. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proc. INTERSPEECH. ISCA, Lyon, France, 1--5.Google Scholar
- David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018a. Spoken language recognition using x-vectors.. In Odyssey, Vol. 2018. 105--111.Google ScholarCross Ref
- David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018b. X-vectors: Robust dnn embeddings for speaker recognition. In Proc. ICASSP. IEEE, Calgary, Canada, 5329--5333.Google ScholarDigital Library
- Kusha Sridhar and Carlos Busso. 2022. Unsupervised personalization of an emotion recognition system: The unique properties of the externalization of valence in speech. IEEE Transactions on Affective Computing , Vol. 13, 4 (2022), 1959--1972.Google ScholarCross Ref
- Sara Taylor, Natasha Jaques, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard. 2017. Personalized multitask learning for predicting tomorrow's mood, stress, and health. IEEE Transactions on Affective Computing , Vol. 11, 2 (2017), 200--213.Google ScholarCross Ref
- Andreas Triantafyllopoulos, Shuo Liu, and Björn W Schuller. 2021. Deep speaker conditioning for speech emotion recognition. In Proc. IEEE International Conference on Multimedia and Expo (ICME). IEEE, Virtual Conference, 1--6.Google ScholarCross Ref
- Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, et al. 2022. Fatigue prediction in outdoor running conditions using audio data. In Proc. EMBC. IEEE, Glasgow, UK, 2623--2626.Google ScholarCross Ref
- Jürgen Trouvain and Khiet P Truong. 2015. Prosodic characteristics of read speech before and after treadmill running. In Proc. INTERSPEECH. ISCA, Dresden, Germany, 1--5.Google ScholarCross Ref
- Khiet P Truong, Arne Nieuwenhuys, Peter Beek, and Vanessa Evers. 2015. A database for analysis of speech under physical stress: Detection of exercise intensity while running and talking. In Proc. INTERSPEECH. ISCA, Dresden, Germany, 3705--3709.Google ScholarCross Ref
- Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Florian Eyben, and Björn W Schuller. 2022. Dawn of the transformer era in speech emotion recognition: closing the valence gap. arXiv preprint arXiv:2203.07378 (2022), 1--25.Google Scholar
- Dan Zecha, Moritz Einfalt, Christian Eggert, and Rainer Lienhart. 2018. Kinematic pose rectification for performance analysis and retrieval in sports. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, USA, 1791--1799.Google ScholarCross Ref
- Dan Zecha, Moritz Einfalt, and Rainer Lienhart. 2019. Refining joint locations for human pose tracking in sports videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0--0.Google ScholarCross Ref
Index Terms
- Personalised Speech-Based Heart Rate Categorisation Using Weighted-Instance Learning
Recommendations
MFCC-GMM based accent recognition system for Telugu speech signals
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Multiple-Instance Active Learning for Image Categorization
MMM '09: Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia ModelingBoth multiple-instance learning and active learning are widely employed in image categorization, but generally they are applied separately. This paper studies the integration of these two methods. Different from typical active learning approaches, the ...
Learning Instance Weighted Naive Bayes from labeled and unlabeled data
In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data ...
Comments