research-article

Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions

Authors:

Keishi TajimaAuthors Info & Claims

MMASIA '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

Article No.: 94, Pages 1 - 7

https://doi.org/10.1145/3696409.3700256

Published: 28 December 2024 Publication History

Abstract

In this paper, we propose a method for recognizing ingredients present in each cooking step in multimedia recipes. We first introduce and validate three hypotheses on the characteristics of cooking steps in recipes: (1) ingredients are most difficult to recognize in the intermediate and finishing stages, where they lose their original appearance, (2) a step often inherits ingredients from the previous step but not always from the immediately previous step when there are parallel subtasks, and (3) the last step includes all ingredients used in the recipe. Consequently, based on these hypotheses, we introduce the following features into our method: (1) each step adaptively inherits features from similar preceding steps, where ingredients are easier to recognize, and (2) we decide the thresholds for each class and each recipe adaptively by using the prediction result of our method for the last step, where all ingredients appear. The experimental results demonstrate the improved performance of our method compared to the baseline methods, showcasing the effectiveness of our approach.

References

[1]

Marc Bolaños, Aina Ferrà, and Petia Radeva. 2017. Food ingredients recognition through multi-label learning. In New Trends in Image Analysis and Processing–ICIAP 2017: ICIAP International Workshops, WBICV, SSPandBE, 3AS, RGBD, NIVAR, IWBAAS, and MADiMa 2017, Catania, Italy, September 11-15, 2017, Revised Selected Papers 19. Springer, 394–402.

[2]

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 – Mining Discriminative Components with Random Forests. In European Conference on Computer Vision.

[3]

Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. 2018. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 35–44.

Digital Library

[4]

Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM Multimedia. 32–41.

Digital Library

[5]

Jingjing Chen, Liangming Pan, Zhipeng Wei, Xiang Wang, Chong-Wah Ngo, and Tat-Seng Chua. 2020. Zero-shot ingredient recognition by multi-relational graph convolutional network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10542–10550.

[6]

Jingjing Chen, Lei Pang, and Chong-Wah Ngo. 2017. Cross-modal recipe retrieval: How to cook this dish?. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23. Springer, 588–600.

[7]

Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang. 2020. A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing 30 (2020), 1514–1526.

[8]

Lucia Donatelli et al.2021. Aligning Actions Across Recipe Graphs. In Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing. 6930–6942.

[9]

Mario Bollini et al.2013. Interpreting and executing recipes with a cooking robot. In Experimental Robotics. Springer, 481–495.

[10]

Xin Chen et al.2017. Chinesefoodnet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:https://arXiv.org/abs/1705.02743 (2017).

[11]

Jixiang Gao, Jingjing Chen, Huazhu Fu, and Yu-Gang Jiang. 2022. Dynamic mixup for multi-label long-tailed food ingredient recognition. IEEE Transactions on Multimedia (2022).

[12]

Reiko Hamada, Jun Okabe, Ichiro Ide, Shin’ichi Satoh, Shuichi Sakai, and Hidehiko Tanaka. 2005. Cooking navi: assistant for daily cooking in kitchen. In Proceedings of the 13th annual ACM international conference on Multimedia. 371–374.

Digital Library

[13]

Jun Harashima, Yuichiro Someya, and Yohei Kikuta. 2017. Cookpad image dataset: An image collection as infrastructure for food research. In Proceedings of the 40th International ACM SIGIR. 1229–1232.

Digital Library

[14]

Atsushi Hashimoto, Naoyuki Mori, Takuya Funatomi, Yoko Yamakata, Koh Kakusho, and Michihiko Minoh. 2008. Smart kitchen: A user centric cooking support system. In Proceedings of IPMU, Vol. 8. 848–854.

[15]

Luis Herranz, Shuqiang Jiang, and Ruihan Xu. 2016. Modeling restaurant context for food recognition. IEEE Transactions on Multimedia 19, 2 (2016), 430–440.

[16]

Shota Horiguchi, Sosuke Amano, Makoto Ogawa, and Kiyoharu Aizawa. 2018. Personalized classifier for food image recognition. IEEE Transactions on Multimedia 20, 10 (2018), 2836–2848.

[17]

Jermsak Jermsurawong and Nizar Habash. 2015. Predicting the structure of cooking recipes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 781–786.

[18]

Chong-wah NGO Jing-jing Chen. 2016. Deep-based Ingredient Recognition for Cooking Recipe Retrival. ACM Multimedia (2016).

[19]

Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food detection and recognition using convolutional neural network. In Proceedings of the 22nd ACM Multimedia. 1085–1088.

Digital Library

[20]

Y. Kawano and K. Yanai. 2014. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In Proc. of ECCV Workshop on Transferring and Adapting Source Knowledge in Computer Vision.

[21]

Yoshiyuki Kawano and Keiji Yanai. 2014. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In ECCV. Springer, 3–17.

[22]

Yoshiyuki Kawano and Keiji Yanai. 2015. Foodcam: A real-time food recognition system on a smartphone. Multimedia Tools and Applications 74, 14 (2015), 5263–5287.

[23]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:https://arXiv.org/abs/1412.6980 (2014).

[24]

Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. 2018. Cleannet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5447–5456.

[25]

Pan Liang-Ming, Chen Jingjing, Wu Jianlong, Liu Shaoteng, Ngo Chong-Wah, Kan Min-Yen, Jiang Yugang, and Tat-Seng Chua. 2020. Multi-modal cooking workflow construction for food recipes. In Proceedings of the 28th ACM Multimedia. 1132–1141.

[26]

Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. 2016. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics. Springer, 37–48.

Digital Library

[27]

Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 187–203.

[28]

Simon Mezgec and Barbara Koroušić Seljak. 2017. NutriNet: a deep learning food and drink image recognition system for dietary assessment. Nutrients 9, 7 (2017), 657.

[29]

Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. 2019. Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 1331–1339.

Digital Library

[30]

Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow graph corpus from recipe texts. In Proceedings of the 9th LREC. 2370–2377.

[31]

NII.ac.jp. 2021. NII Cookpad Dataset. https://www.nii.ac.jp/dsc/idr/cookpad/cookpad-user.html

[32]

Taichi Nishimura, Atsushi Hashimoto, and Shinsuke Mori. 2019. Procedural text generation from a photo sequence. In Proceedings of the 12th INLG. 409–414.

[33]

Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Yoko Yamakata, and Shinsuke Mori. 2020. Structure-Aware Procedural Text Generation From an Image Sequence. IEEE Access 9 (2020), 2125–2141.

[34]

Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, and Shinsuke Mori. 2020. Visual grounding annotation of recipe flow graph. In Proc. of the 12th LREC. 4275–4284.

[35]

Paritosh Pandey, Akella Deepthi, Bappaditya Mandal, and Niladri B Puhan. 2017. FoodNet: Recognizing foods using ensemble of deep networks. IEEE Signal Processing Letters 24, 12 (2017), 1758–1762.

[36]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[37]

Hai X Pham, Ricardo Guerrero, Jiatong Li, and Vladimir Pavlovic. 2021. CHEF: Cross-modal hierarchical embeddings for food domain retrieval. arXiv preprint arXiv:https://arXiv.org/abs/2102.02547 (2021).

[38]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.

[39]

Amaia Salvador, Michal Drozdzal, Xavier Giro-i Nieto, and Adriana Romero. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462.

[40]

Amaia Salvador, Erhan Gundogdu, Loris Bazzani, and Michael Donoser. 2021. Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15475–15484.

[41]

Pan Siyuan, Dai Ling, Hou Xuhong, Li Huating, and Sheng Bin. 2020. Chefgan: Food image generation from recipes. In Proceedings of the 28th ACM Multimedia. 4244–4252.

[42]

Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frederic Precioso. 2015. Recipe recognition with large multimodal food dataset. In 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1–6.

[43]

Min Weiqing, Liu Linhu, Wang Zhiling, Luo Zhengdong, and Wei Xiaoming. 2020. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In Proceedings of the 28th ACM Multimedia. 393–401.

[44]

Yoko Yamakata, Shinsuke Mori, and John A Carroll. 2020. English recipe flow graph corpus. In Proceedings of the 12th LREC. 5187–5194.

[45]

Yixin Zhang, Yoko Yamakata, and Keishi Tajima. 2021. Mirecipe: A recipe dataset for stage-aware recognition of changes in appearance of ingredients. In ACM Multimedia Asia. 1–7.

Digital Library

Index Terms

Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

MIAIS: A Multimedia Recipe Dataset with Ingredient Annotation at Each Instructional Step
CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

In this paper, we introduce a multimedia recipe dataset with annotation of ingredients at every instructional step, named MIAIS (Multimedia recipe dataset with Ingredient Annotation at every Instructional Step). One unique feature of recipe data is that ...
Deep-based Ingredient Recognition for Cooking Recipe Retrieval
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Retrieving recipes corresponding to given dish pictures facilitates the estimation of nutrition facts, which is crucial to various health relevant applications. The current approaches mostly focus on recognition of food category based on global dish ...
Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients
Abstract
As concerns about health continue to grow, more and more works are being done in the field of food computing. One of the basic topics in food computing is how to extract important information from food and analysis it from a picture. However, food ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

December 2024

939 pages

ISBN:9798400712739

DOI:10.1145/3696409

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

JSPS KAKENHI

Conference

MMAsia '24

Sponsor:

SIGMM

MMAsia '24: ACM Multimedia Asia

December 3 - 6, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
14
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)14

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Table of Contents