Work in Progress

Videos2Doc: Generating Documents from a Collection of Procedural Videos

Authors:

Aanisha Bhattacharyya,

Jeevana Kruthi Karnuthala,

Bhanu Prakash Reddy Guda,

Abhilasha Sancheti,

Niyati ChhayaAuthors Info & Claims

IUI '22 Companion: Companion Proceedings of the 27th International Conference on Intelligent User Interfaces

Pages 113 - 116

https://doi.org/10.1145/3490100.3516460

Published: 22 March 2022 Publication History

Abstract

The availability of user-generated multi-modal content, including videos and images, in abundance makes it easy for users to use it as a reference and source of information. However, several hours may be required for the consumption of this large corpus of data. Particularly, for authors and content creators abstracting out information from videos to then representing it in a textual format is a tedious task. The challenges are multiplied due to the diversity and the variety introduced when there are several videos associated with a given query or topic of interest. We present, Videos2Doc, a machine learning-based framework for automated document generation from a collection of procedural videos. Videos2Doc enables author-guided document generation for those looking for authoring assistance and an easy consumption experience for those preferring the text or document media over videos. Our proposed interface allows the users to choose several visual and semantic preferences for the output document allowing the generation of custom documents and webpage templates from a given set of inputs. Empirical and qualitative evaluations establish the utility of Videos2Doc as well as the superiority over the current benchmarks. We believe, Videos2Doc will ease the task of making multimedia accessible through automation in conversion to alternate presentation modes.

Supplementary Material

MP4 File (Videos2Doc.mp4)

Supplemental video

Download
614.87 MB

References

[1]

Michał Bień, Michał Gilski, Martyna Maciejewska, Wojciech Taisner, Dawid Wisniewski, and Agnieszka Lawrynowicz. 2020. RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. In Proceedings of the 13th International Conference on Natural Language Generation. Association for Computational Linguistics, Dublin, Ireland, 22–28. https://aclanthology.org/2020.inlg-1.4

[2]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 abs/1810.04805 (2018).

[3]

Edsger W Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271.

[4]

Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational linguistics 28, 3 (2002), 245–288.

[5]

Helena H. Lee, Ke Shu, Palakorn Achananuparp, Philips Kokoh Prasetyo, Yue Liu, Ee-Peng Lim, and Lav R Varshney. 2020. RecipeGPT: Generative pre-training based cooking recipe generation and evaluation system. In Companion Proceedings of the Web Conference 2020. Association for Computing Machinery, 181–184.

[6]

Steven Haussmann, Oshani Seneviratne, Yu Chen, Yarden Ne’eman, James Codella, Ching-Hua Chen, Deborah L McGuinness, and Mohammed J Zaki. 2019. FoodKG: a semantics-driven knowledge graph for food recommendation. In International Semantic Web Conference. Springer, 146–162. https://doi.org/10.1007/978-3-030-30796-7_10

Digital Library

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[8]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.

[9]

Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE transactions on pattern analysis and machine intelligence 43, 1(2019), 187–203.

Digital Library

[10]

Taichi Nishimura, Atsushi Hashimoto, and Shinsuke Mori. 2019. Procedural text generation from a photo sequence. In Proceedings of the 12th International Conference on Natural Language Generation. 409–414.

[11]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.

[12]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).

[13]

Amaia Salvador, Michal Drozdzal, Xavier Giro-i Nieto, and Adriana Romero. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462.

[14]

Fadime Sener and Angela Yao. 2019. Zero-shot anticipation for instructional activities. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 862–871.

[15]

Peng Shi and Jimmy Lin. 2019. Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255(2019).

[16]

Atsushi Ushiku, Hayato Hashimoto, Atsushi Hashimoto, and Shinsuke Mori. 2017. Procedural text generation from an execution video. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 326–335.

[17]

Frank F Xu, Lei Ji, Botian Shi, Junyi Du, Graham Neubig, Yonatan Bisk, and Nan Duan. 2020. A benchmark for structured procedural knowledge extraction from cooking videos. arXiv preprint arXiv:2005.00706(2020).

[18]

Luowei Zhou, Yingbo Zhou, Jason J Corso, Richard Socher, and Caiming Xiong. 2018. End-to-end dense video captioning with masked transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8739–8748.

Index Terms

Videos2Doc: Generating Documents from a Collection of Procedural Videos

Index terms have been assigned to the content through auto-classification.

Recommendations

Generating summary documents for a variable-quality PDF document collection
DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering

The Cochrane Schizophrenia Group's Register of studies details all aspects of the effects of treating people with schizophrenia. It has been gathered over the last 20 years and consists of around 20,000 documents, overwhelmingly in PDF. Document ...
Generating Items Recommendations by Fusing Content and User-Item based Collaborative Filtering
Abstract
Nowadays e-commerce has spread all over the world. The e-shops are not similar to the physical shops. The e-shops can have hundreds or thousands of items independent of physical boundaries. The information about all these products is available on ...
Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents
ICIBE '17: Proceedings of the 3rd International Conference on Industrial and Business Engineering

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IUI '22 Companion: Companion Proceedings of the 27th International Conference on Intelligent User Interfaces

March 2022

142 pages

ISBN:9781450391450

DOI:10.1145/3490100

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2022

Check for updates

Author Tags

Qualifiers

Work in progress
Research
Refereed limited

Conference

IUI '22

Sponsor:

IUI '22: 27th International Conference on Intelligent User Interfaces

March 22 - 25, 2022

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
109
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten