research-article

Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process with Video Encoding

Authors:

Ryohei Orihara,

Yasuyuki Tahara,

Akihiko OhsugaAuthors Info & Claims

APPIS 2020: Proceedings of the 3rd International Conference on Applications of Intelligent Systems

Article No.: 21, Pages 1 - 5

https://doi.org/10.1145/3378184.3378217

Published: 17 February 2020 Publication History

Abstract

Research on generating natural language captions to visual data such as images and videos has produced considerable results with deep learning methods and attracted attention in recent years. In this research, we aim to generate recipe sentences from cooking videos acquired from YouTube. We treat the task as image captioning. There are two aspects to be considered in order to do so. We believe that the semantics of each process should be taken into account to improve the captioning ' s accuracy. Furthermore, data processing, that is obtaining images from each process using several visual processing methods such as object detection should be important. We propose a captioning model where a sentence vector is embedded to consider the consistency of the recipe. From differences between generated recipes and the reference recipe, we can calculate recipe scores. We use three metrics that are used in previous studies to evaluate the image captioning model. We compare the scores to with ones from baseline models.

References

[1]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.

[2]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https://doi.org/10.3115/v1/D14-1179

[3]

Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, and Serge Belongie. 2018. Learning to evaluate image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5804--5812.

[4]

Tatsuki Fujii, Yuichi Sei, Yasuyuki Tahara, Ryohei Orihara, and Akihiko Ohsuga. 2019. "Never fry carrots without chopping" Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process. International Journal of Networked and Distributed Computing 7 (2019), 107--112. Issue 3. https://doi.org/10.2991/ijndc.k.190710.002

[5]

Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, and Heng Tao Shen. 2016. Attention-based LSTM with semantic consistency for videos captioning. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 357--361.

Digital Library

[6]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

[7]

Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[8]

Guang Li, Shubo Ma, and Yahong Han. 2015. Summarization-based video caption via deep neural networks. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 1191--1194.

Digital Library

[9]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.

[10]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.

[11]

Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3020--3028.

[12]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International conference on machine learning. 843--852.

Digital Library

[13]

Atsushi Ushiku, Hayato Hashimoto, Atsushi Hashimoto, and Shinsuke Mori. 2017. Procedural Text Generation from an Execution Video. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 326--335.

[14]

Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4566--4575.

[15]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2016. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE transactions on pattern analysis and machine intelligence 39, 4 (2016), 652--663.

[16]

Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, and William Yang Wang. 2018. Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4213--4222.

[17]

WishartLab. June 29 2017 (accessed September 1st, 2019). FOODB. http://foodb.ca.

[18]

Luowei Zhou, Chenliang Xu, and Jason J Corso. 2018. Towards automatic learning of procedures from web instructional videos. In Thirty-Second AAAI Conference on Artificial Intelligence.

Cited By

Sei YEnoki KYamaguchi SSaito K(2022)PREDICTION OF BOILING HEAT TRANSFER COEFFICIENTS FOR MINI-CHANNELSMultiphase Science and Technology10.1615/MultScienTechn.202203908934:2(43-65)Online publication date: 2022
https://doi.org/10.1615/MultScienTechn.2022039089
Gim MChoi DMaruyama KChoi JKim HPark DKang JAl Hasan MXiong L(2022)RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557092(3092-3102)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557092

Index Terms

Generating Cooking Recipes from Cooking Videos Using Deep Learning Considering Previous Process with Video Encoding
1. Computing methodologies
  1. Machine learning

Recommendations

RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System
WWW '20: Companion Proceedings of the Web Conference 2020

Interests in the automatic generation of cooking recipes have been growing steadily over the past few years thanks to a large amount of online cooking recipes. We present RecipeGPT, a novel online recipe generation and evaluation system. The system ...
Node.js Recipes: A Problem-Solution Approach
ASP.NET Core Recipes: A Problem-Solution Approach

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APPIS 2020: Proceedings of the 3rd International Conference on Applications of Intelligent Systems

January 2020

214 pages

ISBN:9781450376303

DOI:10.1145/3378184

Editors:
Nicolai Petkov
University of Groningen, The Netherlands
,
Nicola Strisciuglio
University of Twente, The Netherlands
,
Carlos M. Travieso-González
University of Las Palmas de Gran Canaria, Spain

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

APPIS 2020

APPIS 2020: 3rd International Conference on Applications of Intelligent Systems

January 7 - 9, 2020

Las Palmas de Gran Canaria, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sei YEnoki KYamaguchi SSaito K(2022)PREDICTION OF BOILING HEAT TRANSFER COEFFICIENTS FOR MINI-CHANNELSMultiphase Science and Technology10.1615/MultScienTechn.202203908934:2(43-65)Online publication date: 2022
https://doi.org/10.1615/MultScienTechn.2022039089
Gim MChoi DMaruyama KChoi JKim HPark DKang JAl Hasan MXiong L(2022)RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557092(3092-3102)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557092

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten