research-article

Eating Sound Dataset for 20 Food Types and Sound Classification Using Convolutional Neural Networks

Authors:

Jeannette Shijie Ma,

Marcello A. Gómez Maureira,

Jan N. van RijnAuthors Info & Claims

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

Pages 348 - 351

https://doi.org/10.1145/3395035.3425656

Published: 27 December 2020 Publication History

Abstract

Food identification technology potentially benefits both food and media industries, and can enrich human-computer interaction. We assembled a food classification dataset consisting of 11,141 clips, based on YouTube videos of 20 food types. This dataset is freely available on Kaggle. We suggest the grouped holdout evaluation protocol as evaluation method to assess model performance. As a first approach, we applied Convolutional Neural Networks on this dataset. When applying an evaluation protocol based on grouped holdout, the model obtained an accuracy of 18.5%, whereas when applying an evaluation protocol based on uniform holdout, the model obtained an accuracy of 37.58%. When approaching this as a binary classification task, the model performed well for most pairs. In both settings, the method clearly outperformed reasonable baselines. We found that besides texture properties, eating action differences are important consideration for data driven eating sound researches. Protocols based on biting sound are limited to textural classification and less heuristic while assembling food differences.

References

[1]

Oliver Amft. 2010. A wearable earpad sensor for chewing monitoring. In Proceedings of IEEE Sensors Conference. IEEE, 222--227.

[2]

François Chollet et al. 2015. Keras. https://keras.io.

[3]

C Dacremont. 1995. Spectral composition of eating sounds generated by crispy, crunchy and crackly foods. Journal of texture studies 26, 1 (1995), 27--43.

[4]

Lisa Duizer. 2001. A review of acoustic research for studying the sensory perception of crisp, crunchy and crackly textures.Trends in food science & technology 12, 1 (2001), 17--24.

[5]

Ryan S Elder and Gina S Mohr. 2016. The crunch effect: Food sound salience as a consumption monitoring cue. Food quality and Preference51 (2016), 39--46.

[6]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Digital Library

[7]

Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al.2017. CNN architectures for large-scale audio classification. In2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 131--135.

[8]

Konstantinos Kyritsis, Christos Diou, and Anastasios Delopoulos. 2020. A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches. IEEE Journal of Biomedical and Health Informatics(2020).

[9]

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference (SciPy 2015).18--25.

[10]

Mark Mirtchouk, Dana L. McGuire, Andrea L. Deierlein, and Samantha Kleinberg. 2019. Automated Estimation of Food Type from Body-worn Audio and Motion Sensors in Free-Living Environments. In Proceedings of the 4th Machine Learning for Healthcare Conference, Vol. 106. PMLR, 641--662.

[11]

Karol J Piczak. 2015. Environmental sound classification with convolutional neural networks. In2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.

[12]

Justin Salamon and Juan Pablo Bello. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification.IEEE Signal Processing Letters24, 3 (2017), 279--283.

[13]

Masaki Shuzo, Shintaro Komori, Tomoko Takashima, Guillaume Lopez, Seiji Tatsuta, Shintaro Yanagimoto, Shin'ichi Warisawa, Jean-Jacques Delaunay, and Ichiro Yamada. 2010. Wearable eating habit sensing system using internal body sound. Journal of Advanced Mechanical Design, Systems, and Manufacturing4, 1(2010), 158--166.

[14]

Zata Vickers. 1991. Sound perception and food quality. Journal of Food Quality 14, 1 (1991), 87--96.

[15]

Tri Vu, Feng Lin, Nabil Alshurafa, and Wenyao Xu. 2017. Wearable food intake monitoring technologies: A comprehensive review.Computers6, 1 (2017), 4.

[16]

Massimiliano Zampini and Charles Spence. 2004. The role of auditory cues in modulating the perceived crispness and staleness of potato chips. Journal of Sensory Studies19, 5 (2004), 347--363.

Cited By

Kumar YKoul AKamini Woźniak MShafi JIjaz M(2024)Automated detection and recognition system for chewable food items using advanced deep learning modelsScientific Reports10.1038/s41598-024-57077-z14:1Online publication date: 19-Mar-2024
https://doi.org/10.1038/s41598-024-57077-z
Lopes RDacanal G(2023)Classification of crispness of food materials by deep neural networksJournal of Texture Studies10.1111/jtxs.1279254:6(845-859)Online publication date: Aug-2023
https://doi.org/10.1111/jtxs.12792
Ceccaldi ENiewiadomski RMancini MVolpe G(2022)What's on your plate? Collecting multimodal data to understand commensal behaviorFrontiers in Psychology10.3389/fpsyg.2022.91100013Online publication date: 30-Sep-2022
https://doi.org/10.3389/fpsyg.2022.911000

Index Terms

Eating Sound Dataset for 20 Food Types and Sound Classification Using Convolutional Neural Networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Robust classification of eating sound collected in natural meal environment
UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

Increasing the number of chewing can help reduce obesity. Nevertheless, it is difficult for a person to keep track of his mastication rate without the help of an automatic mastication counting device. Such devices do exist, but they are big and non-...
Application of neural networks to speech/music/noise classification in digital hearing aids
GAVTASC'11: Proceedings of the 11th WSEAS international conference on Signal processing, computational geometry and artificial vision, and Proceedings of the 11th WSEAS international conference on Systems theory and scientific computation

This paper focuses on the development of an automatic sound classifier embedded in a digital hearing aid aiming at enhancing the listening comprehension when the user goes from a sound environment to another different one. The approach we propose in ...
Multi-class pattern classification using neural networks

Multi-class pattern classification has many applications including text document classification, speech recognition, object recognition, etc. Multi-class pattern classification using neural networks is not a trivial extension from two-class neural ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

October 2020

548 pages

ISBN:9781450380027

DOI:10.1145/3395035

General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '20

Sponsor:

SIGCHI

ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 25 - 29, 2020

Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
150
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar YKoul AKamini Woźniak MShafi JIjaz M(2024)Automated detection and recognition system for chewable food items using advanced deep learning modelsScientific Reports10.1038/s41598-024-57077-z14:1Online publication date: 19-Mar-2024
https://doi.org/10.1038/s41598-024-57077-z
Lopes RDacanal G(2023)Classification of crispness of food materials by deep neural networksJournal of Texture Studies10.1111/jtxs.1279254:6(845-859)Online publication date: Aug-2023
https://doi.org/10.1111/jtxs.12792
Ceccaldi ENiewiadomski RMancini MVolpe G(2022)What's on your plate? Collecting multimodal data to understand commensal behaviorFrontiers in Psychology10.3389/fpsyg.2022.91100013Online publication date: 30-Sep-2022
https://doi.org/10.3389/fpsyg.2022.911000

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten