research-article

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

Authors:

Björn Schuller,

Anton Batliner,

Shahin Amiriparian,

Christian Bergler,

Maurice Gerczuk,

Pauline Larrouy-Maestri,

Sebastien Bayerl,

Korbinian Riedhammer,

Adria Mallol-Ragolta,

Maria Pateraki,

Marianne Sinka,

Stephen RobertsAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 7120 - 7124

https://doi.org/10.1145/3503161.3551591

Published: 10 October 2022 Publication History

Abstract

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' ComParE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectrum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv (2016).

[2]

Shahin Amiriparian. 2019. Deep Representation Learning Techniques for Audio Signal Processing. Ph.D. Dissertation. Technische Universität München.

[3]

Shahin Amiriparian, Michael Freitag, Nicholas Cummins, and Björn Schuller. 2017a. Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio. In Proc. DCASE 2017. Munich, Germany, 17--21.

[4]

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, and Björn Schuller. 2017b. Snore Sound Classification Using Image-based Deep Spectrum Features. In Proc. Interspeech 2017. ISCA, Stockholm, Sweden, 3512--3516.

[5]

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Sergey Pugachevskiy, and Björn Schuller. 2018. Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis. In Proc. IJCNN. IEEE, Rio de Janeiro, Brazil, 2419--2425.

[6]

Shahin Amiriparian, Maurice Gerczuk, Lukas Stappen, Alice Baird, Lukas Koebe, Sandra Ottl, and Björn Schuller. 2020. Towards Cross-Modal Pre-Training and Learning Tempo-Spatial Characteristics for Audio Recognition with Convolutional and Recurrent Neural Networks. EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2020, 19 (2020), 1--11.

Digital Library

[7]

Shahin Amiriparian, Tobias Hübner, Vincent Karas, Maurice Gerczuk, Sandra Ottl, and Björn W. Schuller. 2022. DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing From Decentralized Data. Frontiers in Artificial Intelligence, Vol. 5 (2022).

[8]

Sebastian P. Bayerl, Florian Hönig, Joëlle Reister, and Korbinian Riedhammer. 2020. Towards Automated Assessment of Stuttering and Stuttering Therapy. In Proc. TSD. Brno, Czech Republic, 386--396.

Digital Library

[9]

Sebastian P. Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Nöth, and Korbinian Riedhammer. 2022. KSoF: The Kassel State of Fluency Dataset - A Therapy Centered Dataset of Stuttering. In Proc. LREC. Marseille, France.

[10]

Çauğdacs Bilen, Giacomo Ferroni, Francesco Tuveri, Juan Azcarreta, and Sacha Krstulović. 2020. A framework for the robust evaluation of sound event detection. In Proc. ICASSP. IEEE, Barcelona, Spain, 61--65.

[11]

Florian Eyben, Felix Weninger, Florian Groß, and Björn Schuller. 2013. Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In Proc. ACM Multimedia. ACM, Barcelona, Spain, 835--838.

Digital Library

[12]

Kenneth R Fox. 1999. The influence of physical activity on mental well-being. Public Health Nutrition, Vol. 2, 3a (1999), 411--418.

[13]

Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, and Björn Schuller. 2018. auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks. Journal of Machine Learning Research, Vol. 18 (2018), 1--5.

[14]

Natalie Holz, Pauline Larrouy-Maestri, and David Poeppel. 2021. The paradoxical role of emotional intensity in the perception of vocal affect. Scientific reports, Vol. 11, 1 (2021), 1--10.

[15]

Natalie Holz, Pauline Larrouy-Maestri, and David Poeppel. 2022. The Variably Intense Vocalizations of Affect and Emotion (VIVAE) Corpus prompts new perspective on nonspeech perception. Emotion, Vol. 22, 1 (2022), 213--225.

[16]

Ivan Kiskin, Adam D Cobb, Marianne Sinka, Kathy Willis, and Stephen J Roberts. 2021a. Automatic Acoustic Mosquito Tagging with Bayesian Neural Networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 351--366.

[17]

I. Kiskin, M. Sinka, A.D. Cobb, W. Rafique, L. Wang, D. Zilli, B. Gutteridge, R. Dam, T. Marinos, Y. Li, and D. Msaky. 2021b. HumBugDB: A Large-scale Acoustic Mosquito Dataset. In Proc. NeurIPS Track on Datasets and Benchmarks. New Orleans, USA, 1--13.

[18]

Hyungjun Lim, Myung Jong Kim, and Hoirin Kim. 2015. Robust Sound Event Classification Using LBP-HOG Based Bag-of-Audio-Words Feature Representation. In Proc. Interspeech. ISCA, Dresden, Germany, 3325--3329.

[19]

A. Mallol-Ragolta, A. Semertzidou, M. Pateraki, and B. Schuller. 2021. harAGE: A Novel Multimodal Smartwatch-based Dataset for Human Activity Recognition. In Proc. FG. IEEE, Jodhpur, India -- Virtual Event, 1--7.

[20]

A. Mallol-Ragolta, A. Semertzidou, M. Pateraki, and B. Schuller. 2022a. Outer Product-Based Fusion of Smartwatch Sensor Data for Human Activity Recognition. Frontiers in Computer Science, section Mobile and Ubiquitous Computing, Vol. 4 (2022), 1--10. Article ID 796866.

[21]

Adria Mallol-Ragolta, Iraklis Varlamis, Maria Pateraki, Manolis Lourakis, Georgios Athanassiou, Michail Maniadakis, Konstantinos Papoutsakis, Thodoris Papadopoulos, Anastasia Semertzidou, Nicholas Cummins, Björn Schuller, Ion-Anastasios Karolos, Christos Pikridas, Petros Patias, Spyros Vantolas, Leonidas Kallipolitis, Frank Werner, Antonio Ascolese, and Vito Nitti. 2022b. sustAGE 1.0 -- First Prototype, Use Cases, and Usability Evaluation. In Proc. 7th International Conference on Human Interaction & Emerging Technologies: Artificial Intelligence & Future Applications. Springer, Lausanne, Switzerland -- Virtual Event. 10 pages, to appear.

[22]

Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent, Bhiksha Raj, and Tuomas Virtanen. 2019. Sound Event Detection in the DCASE 2017 Challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 27, 6 (2019), 992--1006.

[23]

World Health Organization et al. 2021. World malaria report 2021. (2021).

[24]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035.

Digital Library

[25]

Frank J Penedo and Jason R Dahn. 2005. Exercise and well-being: a review of mental and physical health benefits associated with physical activity. Current Opinion in Psychiatry, Vol. 18, 2 (2005), 189--193.

[26]

Katarzyna Pisanski, Gregory A Bryant, Clément Cornec, Andrey Anikin, and David Reby. 2022. Form follows function in human nonverbal vocalisations. Ethology Ecology & Evolution (2022), 1--19.

[27]

Maximilian Schmitt, Fabien Ringeval, and Björn Schuller. 2016. At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech. In Proc. Interspeech. ISCA, San Francisco, USA, 495--499.

[28]

M. Schmitt and B. W. Schuller. 2017. openXBOW -- Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit. Journal of Machine Learning Research, Vol. 18 (2017), 1--5.

Digital Library

[29]

B. Schuller and A. Batliner. 2014. Computational Paralinguistics -- Emotion, Affect, and Personality in Speech and Language Processing. Wiley, Chichester, UK.

[30]

Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Communication, Vol. 53 (2011), 1062--1087.

Digital Library

[31]

B. Schuller, S. Steidl, and A. Batliner. 2009. The INTERSPEECH 2009 Emotion Challenge. In Proc. Interspeech. ISCA, Brighton, UK, 312--315.

[32]

Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, Marcello Mortillaro, Hugues Salamin, Anna Polychroniou, Fabio Valente, and Samuel Kim. 2013. The Interspeech 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proc. Interspeech. ISCA, Lyon, France, 148--152.

[33]

Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, and Casper Kaandorp. 2021. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates. In Proc. Interspeech. ISCA, Brno, Czechia, 431--435.

[34]

Björn W Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, et al. 2020. The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks. In Proc. Interspeech. ISCA, Shanghai, China, 2042--2046.

[35]

Marianne E Sinka, Davide Zilli, Yunpeng Li, Ivan Kiskin, Daniel Kirkham, Waqas Rafique, Lawrence Wang, Henry Chan, Benjamin Gutteridge, Eva Herreros-Moya, et al. 2021. HumBug--An Acoustic Mosquito Monitoring Tool for use on budget smartphones. Methods in Ecology and Evolution, Vol. 12, 10 (2021), 1848--1859.

[36]

Martin Sommer, Andrea Waltersbacher, Andreas Schlotmann, Helmut Schröder, and Adam Strzelczyk. 2021. Prevalence and Therapy Rates for Stuttering, Cluttering, and Developmental Disorders of Speech and Language: Evaluation of German Health Insurance Data. Frontiers in Human Neuroscience, Vol. 15 (2021).

Cited By

Shen JZhang X(2025)Individual-independent and cross-language detection of speech disfluencies in stuttering based on multi-adversarial tasks and self-trainingBiomedical Signal Processing and Control10.1016/j.bspc.2024.107051100(107051)Online publication date: Feb-2025
https://doi.org/10.1016/j.bspc.2024.107051
Yin ZXu XSchuller B(2025)Request and complaint recognition in call-center speech using a pointwise-convolution recurrent networkInternational Journal of Speech Technology10.1007/s10772-025-10171-7Online publication date: 5-Feb-2025
https://doi.org/10.1007/s10772-025-10171-7
Kapetanidis PKalioras FTsakonas CTzamalis PKontogiannis GKaramanidou TStavropoulos TNikoletseas S(2024)Respiratory Diseases Diagnosis Using Audio Analysis and Artificial Intelligence: A Systematic ReviewSensors10.3390/s2404117324:4(1173)Online publication date: 10-Feb-2024
https://doi.org/10.3390/s24041173
Show More Cited By

Index Terms

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

With the rapid advancement in automatic speech recognition and natural language understanding, a complementary field (paralinguistics) emerged, focusing on the non-verbal content of speech. The ACM Multimedia 2022 Computational Paralinguistics Challenge ...
Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

The ACM Multimedia 2022 Stuttering Challenge is to determine the stuttering-related class of a speech segment. There are seven stuttering-related classes and an eighth garbage class. For this purpose, we have investigated the Wav2Vec 2.0 deep neural ...
The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Gates Foundation
DFG's Reinhart Koselleck
European Union's Horizon 2020
Deutsche Forschungsgemeinschaft

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
294
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen JZhang X(2025)Individual-independent and cross-language detection of speech disfluencies in stuttering based on multi-adversarial tasks and self-trainingBiomedical Signal Processing and Control10.1016/j.bspc.2024.107051100(107051)Online publication date: Feb-2025
https://doi.org/10.1016/j.bspc.2024.107051
Yin ZXu XSchuller B(2025)Request and complaint recognition in call-center speech using a pointwise-convolution recurrent networkInternational Journal of Speech Technology10.1007/s10772-025-10171-7Online publication date: 5-Feb-2025
https://doi.org/10.1007/s10772-025-10171-7
Kapetanidis PKalioras FTsakonas CTzamalis PKontogiannis GKaramanidou TStavropoulos TNikoletseas S(2024)Respiratory Diseases Diagnosis Using Audio Analysis and Artificial Intelligence: A Systematic ReviewSensors10.3390/s2404117324:4(1173)Online publication date: 10-Feb-2024
https://doi.org/10.3390/s24041173
Sun YXu KLiu CDou YWang HDing BPan Q(2024)Automated Data Augmentation for Audio ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.340204932(2716-2728)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3402049
Sharma GDhall ASubramanian R(2024)MARS: A Multiview Contrastive Approach to Human Activity Recognition From Accelerometer SensorIEEE Sensors Letters10.1109/LSENS.2024.33579418:3(1-4)Online publication date: Mar-2024
https://doi.org/10.1109/LSENS.2024.3357941
Gosztolya GSvindt VBóna JHoffmann I(2023)Extracting Phonetic Posterior-Based Features for Detecting Multiple Sclerosis From SpeechIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.330053231(3234-3244)Online publication date: 2023
https://doi.org/10.1109/TNSRE.2023.3300532
Sheikh SSahidullah MHirsch FOuni S(2023)Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep LearningIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.324828127:5(2553-2564)Online publication date: May-2023
https://doi.org/10.1109/JBHI.2023.3248281
Yu YQiu WQuan CQian KWang ZMa YHu BSchuller BYamamoto Y(2023)Federated Intelligent Terminals Facilitate Stuttering MonitoringICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10097263(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10097263
Atmaja BSasou A(2023)Evaluating Variants of wav2vec 2.0 on Affective Vocal Burst TasksICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096552(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10096552
Bayerl SWagner DBaumann IBocklet TRiedhammer K(2023)Detecting Vocal Fatigue with Neural EmbeddingsJournal of Voice10.1016/j.jvoice.2023.01.012Online publication date: Feb-2023
https://doi.org/10.1016/j.jvoice.2023.01.012
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten