skip to main content
10.1145/3595916.3626452acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Open-Vocabulary Segmentation Approach for Transformer-Based Food Nutrient Estimation

Published: 01 January 2024 Publication History

Abstract

Nutrition plays a vital role in overall health and well-being. With a highly accurate nutrient estimation model, we develop a tool that displays nutritional values from food images, thereby reducing the labor-intensiveness of dietary assessment. We propose a method that uses depth data with RGB images and incorporates an open-vocabulary segmentation process that separates food from non-food instances, coupled with two-stage self-attention Transformer decoder. Our model outperforms the current state-of-the-art method, with an average percent MAE of 17.2% on Nutrition5k, an RGB-D food image dataset with calories, mass, and three macronutrients annotated. Our study also focuses on the significance of the food and background regions for calorie, mass, and nutrient estimation. We analyze the impact of non-food regions on each estimation task, with results suggesting that background information is crucial for calorie, mass, and carbohydrate estimation but not as essential for protein and fat estimation. The qualitative results also show that the model attends to regions with a high corresponding nutritional value. Implementation codes and pre-trained models are provided at https://github.com/Oatsty/nutrition5k.

References

[1]
Yoshikazu Ando, Takumi Ege, Jaehyeong Cho, and Keiji Yanai. 2019. Depthcaloriecam: A mobile application for volume-based foodcalorie estimation using depth cameras. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 76–81.
[2]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations.
[3]
Takumi Ege, Wataru Shimoda, and Keiji Yanai. 2019. A new large-scale food image segmentation dataset and its application to food calorie estimation based on grains of rice. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management. 82–87.
[4]
Takumi Ege and Keiji Yanai. 2019. Simultaneous estimation of dish locations and calories with multi-task learning. IEICE TRANSACTIONS on Information and Systems 102, 7 (2019), 1240–1246.
[5]
Hannah Forster, Rosalind Fallaize, Caroline Gallagher, Clare B O’Donovan, Clara Woolhead, Marianne C Walsh, Anna L Macready, Julie A Lovegrove, John C Mathers, Michael J Gibney, 2014. Online dietary intake estimation: the Food4Me food frequency questionnaire. Journal of Medical Internet Research 16, 6 (2014), e3105.
[6]
Mitchell Gersovitz, J Patrick Madden, and Helen Smiciklas-Wright. 1978. Validity of the 24-hr. dietary recall and seven-day record for group comparisons.Journal of the American Dietetic Association 73, 1 (1978), 48–55.
[7]
Mike Gibney, David Allison, Dennis Bier, and Johanna Dwyer. 2020. Uncertainty in human nutrition research. Nature Food 1, 5 (2020), 247–249.
[8]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[10]
AK Illner, H Freisling, H Boeing, Inge Huybrechts, SP Crispim, and N Slimani. 2012. Review and evaluation of innovative technologies for measuring diet in nutritional epidemiology. International Journal of Epidemiology 41, 4 (2012), 1187–1203.
[11]
Salaki Reynaldo Joshua, Seungheon Shin, Je-Hoon Lee, and Seong Kun Kim. 2023. Health to Eat: A Smart Plate with Food Recognition, Classification, and Weight Measurement for Type-2 Diabetic Mellitus Patients’ Nutrition Control. Sensors 23, 3 (2023), 1656.
[12]
Sharon I Kirkpatrick, Amy F Subar, Deirdre Douglass, Thea P Zimmerman, Frances E Thompson, Lisa L Kahle, Stephanie M George, Kevin W Dodd, and Nancy Potischman. 2014. Performance of the Automated Self-Administered 24-hour Recall relative to a measure of true intakes and to an interviewer-administered 24-h recall. The American Journal of Clinical Nutrition 100, 1 (2014), 233–240.
[13]
Alan R Kristal, Ann S Kolar, James L Fisher, Jesse J Plascak, Phyllis J Stumbo, Rick Weiss, and Electra D Paskett. 2014. Evaluation of web-based, self-administered, graphical food frequency questionnaire. Journal of the Academy of Nutrition and Dietetics 114, 4 (2014), 613–621.
[14]
Béatrice Lauby-Secretan, Chiara Scoccianti, Dana Loomis, Yann Grosse, Franca Bianchini, and Kurt Straif. 2016. Body fatness and cancer—viewpoint of the IARC Working Group. New England Journal of Medicine 375, 8 (2016), 794–798.
[15]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.
[16]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[17]
Ya Lu, Dario Allegra, Marios Anthimopoulos, Filippo Stanco, Giovanni Maria Farinella, and Stavroula Mougiakakou. 2018. A multi-task learning approach for meal assessment. In Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management. 46–52.
[18]
Ya Lu, Thomai Stathopoulou, Maria F Vasiloglou, Lillian F Pinault, Colleen Kiley, Elias K Spanakis, and Stavroula Mougiakakou. 2020. goFOODTM: an artificial intelligence system for dietary assessment. Sensors 20, 15 (2020), 4283.
[19]
Peihua Ma, Chun Pong Lau, Ning Yu, An Li, Ping Liu, Qin Wang, and Jiping Sheng. 2021. Image-based nutrient estimation for Chinese dishes using deep learning. Food Research International 147 (2021), 110437.
[20]
Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin P Murphy. 2015. Im2Calories: towards an automated mobile vision food diary. In Proceedings of the IEEE International Conference on Computer Vision. 1233–1241.
[21]
Simon Mezgec and Barbara Koroušić Seljak. 2017. NutriNet: a deep learning food and drink image recognition system for dietary assessment. Nutrients 9, 7 (2017), 657.
[22]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).
[23]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241.
[24]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[25]
Wenjing Shao, Sujuan Hou, Weikuan Jia, and Yuanjie Zheng. 2022. Rapid Non-Destructive Analysis of Food Nutrient Content Using Swin-Nutrition. Foods 11, 21 (2022), 3429.
[26]
Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. 2023. Vision-based food nutrition estimation via RGB-D fusion network. Food Chemistry 424 (2023), 136309.
[27]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.
[28]
Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, and Jack Sim. 2021. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8903–8911.
[29]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[30]
Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, and Shuqiang Jiang. 2022. A review on vision-based analysis for automatic dietary assessment. Trends in Food Science & Technology 122 (2022), 223–237.
[31]
Yoko Yamakata, Akihisa Ishino, Akiko Sunto, Sosuke Amano, and Kiyoharu Aizawa. 2022. Recipe-oriented Food Logging for Nutritional Management. In Proceedings of the 30th ACM International Conference on Multimedia. 6898–6904.
[32]
Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, and Lei Zhang. 2023. A simple framework for open-vocabulary segmentation and detection. arXiv preprint arXiv:2303.08131 (2023).

Cited By

View all
  • (2024)Exploring the Trade-Off in the Variational Information Bottleneck for Regression with a Single Training RunEntropy10.3390/e2612104326:12(1043)Online publication date: 30-Nov-2024
  • (2024)Measure and Improve Your Food: Ingredient Estimation Based Nutrition CalculatorProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3684997(11273-11275)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
December 2023
745 pages
ISBN:9798400702051
DOI:10.1145/3595916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dietary assessment
  2. neural networks
  3. nutrient estimation
  4. open-vocabulary segmentation
  5. transformers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • JST AIP
  • JSPS KAKENHI

Conference

MMAsia '23
Sponsor:
MMAsia '23: ACM Multimedia Asia
December 6 - 8, 2023
Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)15
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the Trade-Off in the Variational Information Bottleneck for Regression with a Single Training RunEntropy10.3390/e2612104326:12(1043)Online publication date: 30-Nov-2024
  • (2024)Measure and Improve Your Food: Ingredient Estimation Based Nutrition CalculatorProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3684997(11273-11275)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media