skip to main content
10.1145/3378184.3378217acmotherconferencesArticle/Chapter ViewAbstractPublication PagesappisConference Proceedingsconference-collections
research-article

Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process with Video Encoding

Published: 17 February 2020 Publication History

Abstract

Research on generating natural language captions to visual data such as images and videos has produced considerable results with deep learning methods and attracted attention in recent years. In this research, we aim to generate recipe sentences from cooking videos acquired from YouTube. We treat the task as image captioning. There are two aspects to be considered in order to do so. We believe that the semantics of each process should be taken into account to improve the captioning ' s accuracy. Furthermore, data processing, that is obtaining images from each process using several visual processing methods such as object detection should be important. We propose a captioning model where a sentence vector is embedded to consider the consistency of the recipe. From differences between generated recipes and the reference recipe, we can calculate recipe scores. We use three metrics that are used in previous studies to evaluate the image captioning model. We compare the scores to with ones from baseline models.

References

[1]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.
[2]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https://doi.org/10.3115/v1/D14-1179
[3]
Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, and Serge Belongie. 2018. Learning to evaluate image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5804--5812.
[4]
Tatsuki Fujii, Yuichi Sei, Yasuyuki Tahara, Ryohei Orihara, and Akihiko Ohsuga. 2019. "Never fry carrots without chopping" Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process. International Journal of Networked and Distributed Computing 7 (2019), 107--112. Issue 3. https://doi.org/10.2991/ijndc.k.190710.002
[5]
Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, and Heng Tao Shen. 2016. Attention-based LSTM with semantic consistency for videos captioning. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 357--361.
[6]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[7]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[8]
Guang Li, Shubo Ma, and Yahong Han. 2015. Summarization-based video caption via deep neural networks. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 1191--1194.
[9]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[10]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.
[11]
Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3020--3028.
[12]
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International conference on machine learning. 843--852.
[13]
Atsushi Ushiku, Hayato Hashimoto, Atsushi Hashimoto, and Shinsuke Mori. 2017. Procedural Text Generation from an Execution Video. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 326--335.
[14]
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4566--4575.
[15]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2016. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE transactions on pattern analysis and machine intelligence 39, 4 (2016), 652--663.
[16]
Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, and William Yang Wang. 2018. Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4213--4222.
[17]
WishartLab. June 29 2017 (accessed September 1st, 2019). FOODB. http://foodb.ca.
[18]
Luowei Zhou, Chenliang Xu, and Jason J Corso. 2018. Towards automatic learning of procedures from web instructional videos. In Thirty-Second AAAI Conference on Artificial Intelligence.

Cited By

View all
  • (2022)PREDICTION OF BOILING HEAT TRANSFER COEFFICIENTS FOR MINI-CHANNELSMultiphase Science and Technology10.1615/MultScienTechn.202203908934:2(43-65)Online publication date: 2022
  • (2022)RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557092(3092-3102)Online publication date: 17-Oct-2022

Index Terms

  1. Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process with Video Encoding

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    APPIS 2020: Proceedings of the 3rd International Conference on Applications of Intelligent Systems
    January 2020
    214 pages
    ISBN:9781450376303
    DOI:10.1145/3378184
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Captioning
    2. Deep Learning
    3. Object Detection
    4. Recipe Generation
    5. Sentence Vector
    6. Video Encoding

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    APPIS 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)PREDICTION OF BOILING HEAT TRANSFER COEFFICIENTS FOR MINI-CHANNELSMultiphase Science and Technology10.1615/MultScienTechn.202203908934:2(43-65)Online publication date: 2022
    • (2022)RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557092(3092-3102)Online publication date: 17-Oct-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media