research-article

Open-Vocabulary Segmentation Approach for Transformer-Based Food Nutrient Estimation

Authors:

Satayu Parinayok,

Kiyoharu AizawaAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 78, Pages 1 - 7

https://doi.org/10.1145/3595916.3626452

Published: 01 January 2024 Publication History

Abstract

Nutrition plays a vital role in overall health and well-being. With a highly accurate nutrient estimation model, we develop a tool that displays nutritional values from food images, thereby reducing the labor-intensiveness of dietary assessment. We propose a method that uses depth data with RGB images and incorporates an open-vocabulary segmentation process that separates food from non-food instances, coupled with two-stage self-attention Transformer decoder. Our model outperforms the current state-of-the-art method, with an average percent MAE of 17.2% on Nutrition5k, an RGB-D food image dataset with calories, mass, and three macronutrients annotated. Our study also focuses on the significance of the food and background regions for calorie, mass, and nutrient estimation. We analyze the impact of non-food regions on each estimation task, with results suggesting that background information is crucial for calorie, mass, and carbohydrate estimation but not as essential for protein and fat estimation. The qualitative results also show that the model attends to regions with a high corresponding nutritional value. Implementation codes and pre-trained models are provided at https://github.com/Oatsty/nutrition5k.

References

[1]

Yoshikazu Ando, Takumi Ege, Jaehyeong Cho, and Keiji Yanai. 2019. Depthcaloriecam: A mobile application for volume-based foodcalorie estimation using depth cameras. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 76–81.

Digital Library

[2]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations.

[3]

Takumi Ege, Wataru Shimoda, and Keiji Yanai. 2019. A new large-scale food image segmentation dataset and its application to food calorie estimation based on grains of rice. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 82–87.

Digital Library

[4]

Takumi Ege and Keiji Yanai. 2019. Simultaneous estimation of dish locations and calories with multi-task learning. IEICE TRANSACTIONS on Information and Systems 102, 7 (2019), 1240–1246.

[5]

Hannah Forster, Rosalind Fallaize, Caroline Gallagher, Clare B O’Donovan, Clara Woolhead, Marianne C Walsh, Anna L Macready, Julie A Lovegrove, John C Mathers, Michael J Gibney, 2014. Online dietary intake estimation: the Food4Me food frequency questionnaire. Journal of Medical Internet Research 16, 6 (2014), e3105.

[6]

Mitchell Gersovitz, J Patrick Madden, and Helen Smiciklas-Wright. 1978. Validity of the 24-hr. dietary recall and seven-day record for group comparisons.Journal of the American Dietetic Association 73, 1 (1978), 48–55.

[7]

Mike Gibney, David Allison, Dennis Bier, and Johanna Dwyer. 2020. Uncertainty in human nutrition research. Nature Food 1, 5 (2020), 247–249.

[8]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[10]

AK Illner, H Freisling, H Boeing, Inge Huybrechts, SP Crispim, and N Slimani. 2012. Review and evaluation of innovative technologies for measuring diet in nutritional epidemiology. International Journal of Epidemiology 41, 4 (2012), 1187–1203.

[11]

Salaki Reynaldo Joshua, Seungheon Shin, Je-Hoon Lee, and Seong Kun Kim. 2023. Health to Eat: A Smart Plate with Food Recognition, Classification, and Weight Measurement for Type-2 Diabetic Mellitus Patients’ Nutrition Control. Sensors 23, 3 (2023), 1656.

[12]

Sharon I Kirkpatrick, Amy F Subar, Deirdre Douglass, Thea P Zimmerman, Frances E Thompson, Lisa L Kahle, Stephanie M George, Kevin W Dodd, and Nancy Potischman. 2014. Performance of the Automated Self-Administered 24-hour Recall relative to a measure of true intakes and to an interviewer-administered 24-h recall. The American Journal of Clinical Nutrition 100, 1 (2014), 233–240.

[13]

Alan R Kristal, Ann S Kolar, James L Fisher, Jesse J Plascak, Phyllis J Stumbo, Rick Weiss, and Electra D Paskett. 2014. Evaluation of web-based, self-administered, graphical food frequency questionnaire. Journal of the Academy of Nutrition and Dietetics 114, 4 (2014), 613–621.

[14]

Béatrice Lauby-Secretan, Chiara Scoccianti, Dana Loomis, Yann Grosse, Franca Bianchini, and Kurt Straif. 2016. Body fatness and cancer—viewpoint of the IARC Working Group. New England Journal of Medicine 375, 8 (2016), 794–798.

[15]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.

[16]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.

[17]

Ya Lu, Dario Allegra, Marios Anthimopoulos, Filippo Stanco, Giovanni Maria Farinella, and Stavroula Mougiakakou. 2018. A multi-task learning approach for meal assessment. In Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management. 46–52.

Digital Library

[18]

Ya Lu, Thomai Stathopoulou, Maria F Vasiloglou, Lillian F Pinault, Colleen Kiley, Elias K Spanakis, and Stavroula Mougiakakou. 2020. goFOODTM: an artificial intelligence system for dietary assessment. Sensors 20, 15 (2020), 4283.

[19]

Peihua Ma, Chun Pong Lau, Ning Yu, An Li, Ping Liu, Qin Wang, and Jiping Sheng. 2021. Image-based nutrient estimation for Chinese dishes using deep learning. Food Research International 147 (2021), 110437.

[20]

Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin P Murphy. 2015. Im2Calories: towards an automated mobile vision food diary. In Proceedings of the IEEE International Conference on Computer Vision. 1233–1241.

Digital Library

[21]

Simon Mezgec and Barbara Koroušić Seljak. 2017. NutriNet: a deep learning food and drink image recognition system for dietary assessment. Nutrients 9, 7 (2017), 657.

[22]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).

[23]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241.

[24]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.

[25]

Wenjing Shao, Sujuan Hou, Weikuan Jia, and Yuanjie Zheng. 2022. Rapid Non-Destructive Analysis of Food Nutrient Content Using Swin-Nutrition. Foods 11, 21 (2022), 3429.

[26]

Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. 2023. Vision-based food nutrition estimation via RGB-D fusion network. Food Chemistry 424 (2023), 136309.

[27]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.

[28]

Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, and Jack Sim. 2021. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8903–8911.

[29]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).

[30]

Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, and Shuqiang Jiang. 2022. A review on vision-based analysis for automatic dietary assessment. Trends in Food Science & Technology 122 (2022), 223–237.

[31]

Yoko Yamakata, Akihisa Ishino, Akiko Sunto, Sosuke Amano, and Kiyoharu Aizawa. 2022. Recipe-oriented Food Logging for Nutritional Management. In Proceedings of the 30th ACM International Conference on Multimedia. 6898–6904.

Digital Library

[32]

Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, and Lei Zhang. 2023. A simple framework for open-vocabulary segmentation and detection. arXiv preprint arXiv:2303.08131 (2023).

Cited By

Kudo SOno NKanaya SHuang M(2024)Exploring the Trade-Off in the Variational Information Bottleneck for Regression with a Single Training RunEntropy10.3390/e2612104326:12(1043)Online publication date: 30-Nov-2024
https://doi.org/10.3390/e26121043
Wang LYamakata YMaeda RAizawa KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Measure and Improve Your Food: Ingredient Estimation Based Nutrition CalculatorProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3684997(11273-11275)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3684997

Index Terms

Open-Vocabulary Segmentation Approach for Transformer-Based Food Nutrient Estimation
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Supporting visual assessment of food and nutrient intake in a clinical care setting
CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Monitoring nutritional intake is an important aspect of the care of older people, particularly for those at risk of malnutrition. Current practice for monitoring food intake relies on hand written food charts that have several inadequacies. We describe ...
An Improved Encoder-Decoder Framework for Food Energy Estimation
MADiMa '23: Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management

Dietary assessment is essential to maintaining a healthy lifestyle. Automatic image-based dietary assessment is a growing field of research due to the increasing prevalence of image capturing devices (e.g. mobile phones). In this work, we estimate food ...
Image-based food volume estimation
CEA '13: Proceedings of the 5th international workshop on Multimedia for cooking & eating activities

In this paper, we propose an extension to our previous work on food portion size estimation using a single image and a multi-view volume estimation method. The single-view technique estimates food volume by using prior information (segmentation and food ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

JST AIP
JSPS KAKENHI

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
189
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)15

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kudo SOno NKanaya SHuang M(2024)Exploring the Trade-Off in the Variational Information Bottleneck for Regression with a Single Training RunEntropy10.3390/e2612104326:12(1043)Online publication date: 30-Nov-2024
https://doi.org/10.3390/e26121043
Wang LYamakata YMaeda RAizawa KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Measure and Improve Your Food: Ingredient Estimation Based Nutrition CalculatorProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3684997(11273-11275)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3684997

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten