skip to main content
10.1145/3395035.3425656acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Eating Sound Dataset for 20 Food Types and Sound Classification Using Convolutional Neural Networks

Published: 27 December 2020 Publication History

Abstract

Food identification technology potentially benefits both food and media industries, and can enrich human-computer interaction. We assembled a food classification dataset consisting of 11,141 clips, based on YouTube videos of 20 food types. This dataset is freely available on Kaggle. We suggest the grouped holdout evaluation protocol as evaluation method to assess model performance. As a first approach, we applied Convolutional Neural Networks on this dataset. When applying an evaluation protocol based on grouped holdout, the model obtained an accuracy of 18.5%, whereas when applying an evaluation protocol based on uniform holdout, the model obtained an accuracy of 37.58%. When approaching this as a binary classification task, the model performed well for most pairs. In both settings, the method clearly outperformed reasonable baselines. We found that besides texture properties, eating action differences are important consideration for data driven eating sound researches. Protocols based on biting sound are limited to textural classification and less heuristic while assembling food differences.

References

[1]
Oliver Amft. 2010. A wearable earpad sensor for chewing monitoring. In Proceedings of IEEE Sensors Conference. IEEE, 222--227.
[2]
François Chollet et al. 2015. Keras. https://keras.io.
[3]
C Dacremont. 1995. Spectral composition of eating sounds generated by crispy, crunchy and crackly foods. Journal of texture studies 26, 1 (1995), 27--43.
[4]
Lisa Duizer. 2001. A review of acoustic research for studying the sensory perception of crisp, crunchy and crackly textures.Trends in food science & technology 12, 1 (2001), 17--24.
[5]
Ryan S Elder and Gina S Mohr. 2016. The crunch effect: Food sound salience as a consumption monitoring cue. Food quality and Preference51 (2016), 39--46.
[6]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
[7]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al.2017. CNN architectures for large-scale audio classification. In2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 131--135.
[8]
Konstantinos Kyritsis, Christos Diou, and Anastasios Delopoulos. 2020. A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches. IEEE Journal of Biomedical and Health Informatics(2020).
[9]
Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference (SciPy 2015).18--25.
[10]
Mark Mirtchouk, Dana L. McGuire, Andrea L. Deierlein, and Samantha Kleinberg. 2019. Automated Estimation of Food Type from Body-worn Audio and Motion Sensors in Free-Living Environments. In Proceedings of the 4th Machine Learning for Healthcare Conference, Vol. 106. PMLR, 641--662.
[11]
Karol J Piczak. 2015. Environmental sound classification with convolutional neural networks. In2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.
[12]
Justin Salamon and Juan Pablo Bello. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification.IEEE Signal Processing Letters24, 3 (2017), 279--283.
[13]
Masaki Shuzo, Shintaro Komori, Tomoko Takashima, Guillaume Lopez, Seiji Tatsuta, Shintaro Yanagimoto, Shin'ichi Warisawa, Jean-Jacques Delaunay, and Ichiro Yamada. 2010. Wearable eating habit sensing system using internal body sound. Journal of Advanced Mechanical Design, Systems, and Manufacturing4, 1(2010), 158--166.
[14]
Zata Vickers. 1991. Sound perception and food quality. Journal of Food Quality 14, 1 (1991), 87--96.
[15]
Tri Vu, Feng Lin, Nabil Alshurafa, and Wenyao Xu. 2017. Wearable food intake monitoring technologies: A comprehensive review.Computers6, 1 (2017), 4.
[16]
Massimiliano Zampini and Charles Spence. 2004. The role of auditory cues in modulating the perceived crispness and staleness of potato chips. Journal of Sensory Studies19, 5 (2004), 347--363.

Cited By

View all
  • (2024)Automated detection and recognition system for chewable food items using advanced deep learning modelsScientific Reports10.1038/s41598-024-57077-z14:1Online publication date: 19-Mar-2024
  • (2023)Classification of crispness of food materials by deep neural networksJournal of Texture Studies10.1111/jtxs.1279254:6(845-859)Online publication date: Aug-2023
  • (2022)What's on your plate? Collecting multimodal data to understand commensal behaviorFrontiers in Psychology10.3389/fpsyg.2022.91100013Online publication date: 30-Sep-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
October 2020
548 pages
ISBN:9781450380027
DOI:10.1145/3395035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. eating sound
  2. food classification
  3. neural networks
  4. sound classification
  5. sound dataset

Qualifiers

  • Research-article

Conference

ICMI '20
Sponsor:
ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 25 - 29, 2020
Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automated detection and recognition system for chewable food items using advanced deep learning modelsScientific Reports10.1038/s41598-024-57077-z14:1Online publication date: 19-Mar-2024
  • (2023)Classification of crispness of food materials by deep neural networksJournal of Texture Studies10.1111/jtxs.1279254:6(845-859)Online publication date: Aug-2023
  • (2022)What's on your plate? Collecting multimodal data to understand commensal behaviorFrontiers in Psychology10.3389/fpsyg.2022.91100013Online publication date: 30-Sep-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media