skip to main content
10.1145/3490100.3516460acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
Work in Progress

Videos2Doc: Generating Documents from a Collection of Procedural Videos

Published: 22 March 2022 Publication History

Abstract

The availability of user-generated multi-modal content, including videos and images, in abundance makes it easy for users to use it as a reference and source of information. However, several hours may be required for the consumption of this large corpus of data. Particularly, for authors and content creators abstracting out information from videos to then representing it in a textual format is a tedious task. The challenges are multiplied due to the diversity and the variety introduced when there are several videos associated with a given query or topic of interest. We present, Videos2Doc, a machine learning-based framework for automated document generation from a collection of procedural videos. Videos2Doc enables author-guided document generation for those looking for authoring assistance and an easy consumption experience for those preferring the text or document media over videos. Our proposed interface allows the users to choose several visual and semantic preferences for the output document allowing the generation of custom documents and webpage templates from a given set of inputs. Empirical and qualitative evaluations establish the utility of Videos2Doc as well as the superiority over the current benchmarks. We believe, Videos2Doc will ease the task of making multimedia accessible through automation in conversion to alternate presentation modes.

Supplementary Material

MP4 File (Videos2Doc.mp4)
Supplemental video

References

[1]
Michał Bień, Michał Gilski, Martyna Maciejewska, Wojciech Taisner, Dawid Wisniewski, and Agnieszka Lawrynowicz. 2020. RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. In Proceedings of the 13th International Conference on Natural Language Generation. Association for Computational Linguistics, Dublin, Ireland, 22–28. https://aclanthology.org/2020.inlg-1.4
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 abs/1810.04805 (2018).
[3]
Edsger W Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271.
[4]
Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational linguistics 28, 3 (2002), 245–288.
[5]
Helena H. Lee, Ke Shu, Palakorn Achananuparp, Philips Kokoh Prasetyo, Yue Liu, Ee-Peng Lim, and Lav R Varshney. 2020. RecipeGPT: Generative pre-training based cooking recipe generation and evaluation system. In Companion Proceedings of the Web Conference 2020. Association for Computing Machinery, 181–184.
[6]
Steven Haussmann, Oshani Seneviratne, Yu Chen, Yarden Ne’eman, James Codella, Ching-Hua Chen, Deborah L McGuinness, and Mohammed J Zaki. 2019. FoodKG: a semantics-driven knowledge graph for food recommendation. In International Semantic Web Conference. Springer, 146–162. https://doi.org/10.1007/978-3-030-30796-7_10
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[8]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
[9]
Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE transactions on pattern analysis and machine intelligence 43, 1(2019), 187–203.
[10]
Taichi Nishimura, Atsushi Hashimoto, and Shinsuke Mori. 2019. Procedural text generation from a photo sequence. In Proceedings of the 12th International Conference on Natural Language Generation. 409–414.
[11]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
[12]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019).
[13]
Amaia Salvador, Michal Drozdzal, Xavier Giro-i Nieto, and Adriana Romero. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462.
[14]
Fadime Sener and Angela Yao. 2019. Zero-shot anticipation for instructional activities. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 862–871.
[15]
Peng Shi and Jimmy Lin. 2019. Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255(2019).
[16]
Atsushi Ushiku, Hayato Hashimoto, Atsushi Hashimoto, and Shinsuke Mori. 2017. Procedural text generation from an execution video. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 326–335.
[17]
Frank F Xu, Lei Ji, Botian Shi, Junyi Du, Graham Neubig, Yonatan Bisk, and Nan Duan. 2020. A benchmark for structured procedural knowledge extraction from cooking videos. arXiv preprint arXiv:2005.00706(2020).
[18]
Luowei Zhou, Yingbo Zhou, Jason J Corso, Richard Socher, and Caiming Xiong. 2018. End-to-end dense video captioning with masked transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8739–8748.

Index Terms

  1. Videos2Doc: Generating Documents from a Collection of Procedural Videos
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          IUI '22 Companion: Companion Proceedings of the 27th International Conference on Intelligent User Interfaces
          March 2022
          142 pages
          ISBN:9781450391450
          DOI:10.1145/3490100
          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 March 2022

          Check for updates

          Author Tags

          1. and modeling approaches
          2. applications of intelligent user interfaces
          3. assistive technologies
          4. information retrieval
          5. recommendation system
          6. search

          Qualifiers

          • Work in progress
          • Research
          • Refereed limited

          Conference

          IUI '22
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 746 of 2,811 submissions, 27%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 109
            Total Downloads
          • Downloads (Last 12 months)20
          • Downloads (Last 6 weeks)5
          Reflects downloads up to 20 Feb 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media