skip to main content
10.1145/3696409.3700256acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions

Published: 28 December 2024 Publication History

Abstract

In this paper, we propose a method for recognizing ingredients present in each cooking step in multimedia recipes. We first introduce and validate three hypotheses on the characteristics of cooking steps in recipes: (1) ingredients are most difficult to recognize in the intermediate and finishing stages, where they lose their original appearance, (2) a step often inherits ingredients from the previous step but not always from the immediately previous step when there are parallel subtasks, and (3) the last step includes all ingredients used in the recipe. Consequently, based on these hypotheses, we introduce the following features into our method: (1) each step adaptively inherits features from similar preceding steps, where ingredients are easier to recognize, and (2) we decide the thresholds for each class and each recipe adaptively by using the prediction result of our method for the last step, where all ingredients appear. The experimental results demonstrate the improved performance of our method compared to the baseline methods, showcasing the effectiveness of our approach.

References

[1]
Marc Bolaños, Aina Ferrà, and Petia Radeva. 2017. Food ingredients recognition through multi-label learning. In New Trends in Image Analysis and Processing–ICIAP 2017: ICIAP International Workshops, WBICV, SSPandBE, 3AS, RGBD, NIVAR, IWBAAS, and MADiMa 2017, Catania, Italy, September 11-15, 2017, Revised Selected Papers 19. Springer, 394–402.
[2]
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 – Mining Discriminative Components with Random Forests. In European Conference on Computer Vision.
[3]
Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. 2018. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 35–44.
[4]
Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM Multimedia. 32–41.
[5]
Jingjing Chen, Liangming Pan, Zhipeng Wei, Xiang Wang, Chong-Wah Ngo, and Tat-Seng Chua. 2020. Zero-shot ingredient recognition by multi-relational graph convolutional network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10542–10550.
[6]
Jingjing Chen, Lei Pang, and Chong-Wah Ngo. 2017. Cross-modal recipe retrieval: How to cook this dish?. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23. Springer, 588–600.
[7]
Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang Jiang. 2020. A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing 30 (2020), 1514–1526.
[8]
Lucia Donatelli et al.2021. Aligning Actions Across Recipe Graphs. In Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing. 6930–6942.
[9]
Mario Bollini et al.2013. Interpreting and executing recipes with a cooking robot. In Experimental Robotics. Springer, 481–495.
[10]
Xin Chen et al.2017. Chinesefoodnet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:https://arXiv.org/abs/1705.02743 (2017).
[11]
Jixiang Gao, Jingjing Chen, Huazhu Fu, and Yu-Gang Jiang. 2022. Dynamic mixup for multi-label long-tailed food ingredient recognition. IEEE Transactions on Multimedia (2022).
[12]
Reiko Hamada, Jun Okabe, Ichiro Ide, Shin’ichi Satoh, Shuichi Sakai, and Hidehiko Tanaka. 2005. Cooking navi: assistant for daily cooking in kitchen. In Proceedings of the 13th annual ACM international conference on Multimedia. 371–374.
[13]
Jun Harashima, Yuichiro Someya, and Yohei Kikuta. 2017. Cookpad image dataset: An image collection as infrastructure for food research. In Proceedings of the 40th International ACM SIGIR. 1229–1232.
[14]
Atsushi Hashimoto, Naoyuki Mori, Takuya Funatomi, Yoko Yamakata, Koh Kakusho, and Michihiko Minoh. 2008. Smart kitchen: A user centric cooking support system. In Proceedings of IPMU, Vol. 8. 848–854.
[15]
Luis Herranz, Shuqiang Jiang, and Ruihan Xu. 2016. Modeling restaurant context for food recognition. IEEE Transactions on Multimedia 19, 2 (2016), 430–440.
[16]
Shota Horiguchi, Sosuke Amano, Makoto Ogawa, and Kiyoharu Aizawa. 2018. Personalized classifier for food image recognition. IEEE Transactions on Multimedia 20, 10 (2018), 2836–2848.
[17]
Jermsak Jermsurawong and Nizar Habash. 2015. Predicting the structure of cooking recipes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 781–786.
[18]
Chong-wah NGO Jing-jing Chen. 2016. Deep-based Ingredient Recognition for Cooking Recipe Retrival. ACM Multimedia (2016).
[19]
Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food detection and recognition using convolutional neural network. In Proceedings of the 22nd ACM Multimedia. 1085–1088.
[20]
Y. Kawano and K. Yanai. 2014. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In Proc. of ECCV Workshop on Transferring and Adapting Source Knowledge in Computer Vision.
[21]
Yoshiyuki Kawano and Keiji Yanai. 2014. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In ECCV. Springer, 3–17.
[22]
Yoshiyuki Kawano and Keiji Yanai. 2015. Foodcam: A real-time food recognition system on a smartphone. Multimedia Tools and Applications 74, 14 (2015), 5263–5287.
[23]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:https://arXiv.org/abs/1412.6980 (2014).
[24]
Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. 2018. Cleannet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5447–5456.
[25]
Pan Liang-Ming, Chen Jingjing, Wu Jianlong, Liu Shaoteng, Ngo Chong-Wah, Kan Min-Yen, Jiang Yugang, and Tat-Seng Chua. 2020. Multi-modal cooking workflow construction for food recipes. In Proceedings of the 28th ACM Multimedia. 1132–1141.
[26]
Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. 2016. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics. Springer, 37–48.
[27]
Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 187–203.
[28]
Simon Mezgec and Barbara Koroušić Seljak. 2017. NutriNet: a deep learning food and drink image recognition system for dietary assessment. Nutrients 9, 7 (2017), 657.
[29]
Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. 2019. Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 1331–1339.
[30]
Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow graph corpus from recipe texts. In Proceedings of the 9th LREC. 2370–2377.
[32]
Taichi Nishimura, Atsushi Hashimoto, and Shinsuke Mori. 2019. Procedural text generation from a photo sequence. In Proceedings of the 12th INLG. 409–414.
[33]
Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Yoko Yamakata, and Shinsuke Mori. 2020. Structure-Aware Procedural Text Generation From an Image Sequence. IEEE Access 9 (2020), 2125–2141.
[34]
Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, and Shinsuke Mori. 2020. Visual grounding annotation of recipe flow graph. In Proc. of the 12th LREC. 4275–4284.
[35]
Paritosh Pandey, Akella Deepthi, Bappaditya Mandal, and Niladri B Puhan. 2017. FoodNet: Recognizing foods using ensemble of deep networks. IEEE Signal Processing Letters 24, 12 (2017), 1758–1762.
[36]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[37]
Hai X Pham, Ricardo Guerrero, Jiatong Li, and Vladimir Pavlovic. 2021. CHEF: Cross-modal hierarchical embeddings for food domain retrieval. arXiv preprint arXiv:https://arXiv.org/abs/2102.02547 (2021).
[38]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
[39]
Amaia Salvador, Michal Drozdzal, Xavier Giro-i Nieto, and Adriana Romero. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462.
[40]
Amaia Salvador, Erhan Gundogdu, Loris Bazzani, and Michael Donoser. 2021. Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15475–15484.
[41]
Pan Siyuan, Dai Ling, Hou Xuhong, Li Huating, and Sheng Bin. 2020. Chefgan: Food image generation from recipes. In Proceedings of the 28th ACM Multimedia. 4244–4252.
[42]
Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frederic Precioso. 2015. Recipe recognition with large multimodal food dataset. In 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1–6.
[43]
Min Weiqing, Liu Linhu, Wang Zhiling, Luo Zhengdong, and Wei Xiaoming. 2020. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In Proceedings of the 28th ACM Multimedia. 393–401.
[44]
Yoko Yamakata, Shinsuke Mori, and John A Carroll. 2020. English recipe flow graph corpus. In Proceedings of the 12th LREC. 5187–5194.
[45]
Yixin Zhang, Yoko Yamakata, and Keishi Tajima. 2021. Mirecipe: A recipe dataset for stage-aware recognition of changes in appearance of ingredients. In ACM Multimedia Asia. 1–7.

Index Terms

  1. Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia
    December 2024
    939 pages
    ISBN:9798400712739
    DOI:10.1145/3696409
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 December 2024

    Check for updates

    Author Tags

    1. Recipe
    2. Multi-modal Annotation
    3. Multi-label recognition
    4. Datasets

    Qualifiers

    • Research-article

    Funding Sources

    • JSPS KAKENHI

    Conference

    MMAsia '24
    Sponsor:
    MMAsia '24: ACM Multimedia Asia
    December 3 - 6, 2024
    Auckland, New Zealand

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 18 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media