ABSTRACT
In the rapidly evolving field of digital education, the need for efficient and targeted access to information within video content has become critical. This study presents a system designed to enhance the search capabilities of video platforms by generating summary videos that answer user queries. The system uses machine learning and natural language processing techniques to understand complex user queries, pinpoint the exact video segment that provides the answer, and answer user queries more efficiently by providing the user with a summary video around that segment. Preliminary evaluations have demonstrated the system’s potential to accurately identify relevant content and generate effective summaries.
- Evlampios Apostolidis, Eleni Adamantidou, Alexandros I Metsai, Vasileios Mezaris, and Ioannis Patras. 2021. Video summarization using deep neural networks: A survey. Proc. of the IEEE 109, 11 (2021), 1838–1863.Google ScholarCross Ref
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Proc. of Advances in neural information processing systems 33 (2020), 1877–1901.Google Scholar
- Longlong Jing and Yingli Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence 43, 11 (2020), 4037–4058.Google ScholarCross Ref
- Kazuki Kawamura and Jun Rekimoto. 2024. FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual Contexts. In Proc. of the Augmented Humans International Conference 2024.Google Scholar
- Peter H Martorella. 1983. Interactive Video Systems in the Classroom.Social Education 47, 5 (1983), 325–27.Google Scholar
- Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.Google Scholar
- Linda C Petty and Ellen F Rosen. 1987. Computer-based interactive video systems. Behavior Research Methods, Instruments, & Computers 19, 2 (1987), 160–166.Google ScholarCross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. 2359–2369.Google ScholarCross Ref
- Catharyn Shelton, Annie Hale, and Leanna Archambault. 2016. Exploring the Use of Interactive Digital Storytelling Video: Promoting Student Engagement and Learning in a University Hybrid Course. TechTrends 60 (06 2016).Google Scholar
- Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1 (feb 2007), 3–es.Google ScholarDigital Library
- Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proc. of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. 6558.Google ScholarCross Ref
- Sirui Wang and Huei-Lien Chen. 2016. Video That Matters: Enhancing Student Engagement Through Interactive Video-Centric Program in Online Courses. thannual (2016), 136.Google Scholar
- Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proc. of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
- Luowei Zhou, Yingbo Zhou, Jason J Corso, Richard Socher, and Caiming Xiong. 2018. End-to-end dense video captioning with masked transformer. In Proc. of the IEEE conference on computer vision and pattern recognition. 8739–8748.Google ScholarCross Ref
Index Terms
- QA-FastPerson: Extending Video Platform Search Capabilities by Creating Summary Videos in Response to User Queries
Recommendations
FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual Contexts
AHs '24: Proceedings of the Augmented Humans International Conference 2024Quickly understanding lengthy lecture videos is essential for learners with limited time and interest in various topics to improve their learning efficiency. To this end, video summarization has been actively researched to enable users to view only ...
Efficient top-k retrieval for user preference queries
SAC '11: Proceedings of the 2011 ACM Symposium on Applied ComputingEfficient retrieval of the most relevant (i.e. top-k) tuples is an important requirement in information systems which access large amounts of data. In general answering a top-k query request means to retrieve the k-objects which score best for an ...
Impact of search results on user queries
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementIn this paper, we experimentally study how web searchers select the keywords to describe their information needs and specifically we investigate whether query keyword selections are influenced by the results the users reviewed for a previous search. For ...
Comments