research-article

Open access

Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous Environments

Authors:

Rui Neves Madeira,

Nuno CorreiaAuthors Info & Claims

MUM '21: Proceedings of the 20th International Conference on Mobile and Ubiquitous Multimedia

Pages 158 - 168

https://doi.org/10.1145/3490632.3490672

Published: 25 February 2022 Publication History

All formats PDF

Abstract

Creativity and inspiration for problem-solving are critical skills in a group-based learning environment. Communication procedures have seen continuous adjustments over the years, with increased multimedia elements usage like videos to provide superior audience impact. Annotations are a valuable approach for remembering, reflecting, reasoning, and sharing thoughts on the learning process. However, it is hard to control playback flow and add potential notes during video presentations, such as in a classroom context. Teachers often need to move around the classroom to interact with the students, which leads to situations where they are physically far from the computer. Therefore, we developed a multimodal web video annotation tool that combines a voice interaction module with manual annotation capabilities for more intelligent natural interactions towards ubiquitous environments. We observed current video annotation practices and created a new set of principles to guide our research work. Natural language enables users to express their intended actions while interacting with the web video player for annotation purposes. We have developed a customized set of natural language expressions that map the user speech to specific software operations through studying and integrating new artificial intelligence techniques. Finally, the paper presents positive results gathered from a user study conducted to evaluate our solution.

References

[1]

A. Qaffas, A. 2019. Improvement of Chatbots Semantics Using Wit.ai and Word Sequence Kernel: Education Chatbot as a Case Study. International Journal of Modern Education and Computer Science. (2019).

[2]

Amazon Alexa Voice Assistant | Alexa Developer Official Site: 2021. https://developer.amazon.com/en-US/alexa. Accessed: 2021-01-24.

[3]

Augstein, M. 2019. WeldVUI: Establishing Speech-Based Interfaces in Industrial Applications. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019).

[4]

Bellegarda, J.R. 2014. Spoken Language Understanding for Natural Interaction: The Siri Experience. Natural Interaction with Robots, Knowbots and Smartphones.

[5]

Cabral, D. 2011. A creation-tool for contemporary dance using multimodal video annotation. MM’11 - Proceedings of the 2011 ACM Multimedia Conference and Workshops (2011).

[6]

Cabral, D. and Correia, N. 2017. Video editing with pen-based technology. Multimedia Tools and Applications. (2017).

Digital Library

[7]

Cambria, E. and White, B. 2014. Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine.

[8]

Camtasia: 2001. https://www.techsmith.com/video-editor.html. Accessed: 2021-06-02.

[9]

Carney, M. 2020. Teachable machine: Approachable web-based tool for exploring machine learning classification. Conference on Human Factors in Computing Systems - Proceedings (2020).

Digital Library

[10]

Chang, M. 2019. How to design voice based navigation for how-to videos. Conference on Human Factors in Computing Systems - Proceedings (2019).

Digital Library

[11]

Christina, C. 2017. Powerpoint Controller using Speech Recognition.

[12]

Cohen, P.R. and Oviatt, S. 2017. Multimodal speech and pen interfaces. The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1. (2017), 403–447.

Digital Library

[13]

Colasante, M. and Douglas, K. 2016. Prepare-participate-connect: Active learning with video annotation. Australasian Journal of Educational Technology. 32, 4 (Nov. 2016), 68–91.

[14]

Cortana - Your personal productivity assistant: 2021. https://www.microsoft.com/en-us/cortana. Accessed: 2021-01-24.

[15]

Frame.io: 2021. https://www.frame.io/. Accessed: 2021-05-25.

[16]

Fraser, C.A. 2020. ReMap: Lowering the barrier to help-seeking with multimodal search. UIST 2020 - Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (2020).

Digital Library

[17]

Furui, S. 2005. 50 Years of Progress in Speech and Speaker Recognition Research. ECTI Transactions on Computer and Information Technology (ECTI-CIT). 1, 2 (Jan. 2005), 64–74.

[18]

Gao, T. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. UIST 2015 - Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (2015).

Digital Library

[19]

Gašević, D. 2014. Analytics of the effects of video use and instruction to support reflective learning. ACM International Conference Proceeding Series (2014), 123–132.

[20]

Gerard, C. and Gerard, C. 2021. TensorFlow.js. Practical Machine Learning in JavaScript.

[21]

Goldman, R. 2014. Video Research in the Learning Sciences. Routledge.

[22]

Juang, B.H. and Rabiner, L.R. 2004. Automatic Speech Recognition – A Brief History of the Technology Development. Elsevier Encyclopedia of Language and Linguistics. (2004).

[23]

Kang, R. 2019. Minuet: Multimodal interaction with an internet of things. Proceedings - SUI 2019: ACM Conference on Spatial User Interaction (2019).

Digital Library

[24]

Kipp, M. 2012. Multimedia Annotation, Querying, and Analysis in Anvil. Multimedia Information Extraction: Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance, and Authoring. (Aug. 2012), 351–367.

[25]

Laput, G. 2013. PixelTone: A multimodal interface for image editing. Conference on Human Factors in Computing Systems - Proceedings (2013).

Digital Library

[26]

Lemon, N. 2013. Video annotation for collaborative connections to learning: Case studies from an Australian higher education context. Cutting-Edge Technologies in Higher Education. 6, PARTF (2013), 181–214.

[27]

López, G. 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. Advances in Intelligent Systems and Computing (2018).

[28]

McShane, M. 2017. Natural language understanding (NLU, not NLP) in cognitive systems. AI Magazine. (2017).

Digital Library

[29]

Mehler, B. 2016. Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems. Ergonomics. (2016).

[30]

Mitrevski, M. and Mitrevski, M. 2018. Getting Started with Wit.ai. Developing Conversational Interfaces for iOS.

[31]

ml5js·Friendly Machine Learning For The Web: 2021. https://ml5js.org/. Accessed: 2021-01-25.

[32]

MotionNotes: 2019. https://motion-notes.di.fct.unl.pt/. Accessed: 2020-05-16.

[33]

Mouza, C. and Lavigne, N.C. 2013. Introduction to Emerging Technologies for the Classroom: A Learning Sciences Perspective. Emerging Technologies for the Classroom: A Learning Sciences Perspective. (Jan. 2013), 1–12.

[34]

Oviatt, S. and Cohen, P. 2000. Perceptual user interfaces: multimodal interfaces that process what comes naturally. Communications of the ACM. (2000).

Digital Library

[35]

Pardo, A. 2015. Identifying learning strategies associated with active use of video annotation software. ACM International Conference Proceeding Series (Mar. 2015), 255–259.

Digital Library

[36]

Qattous, H. 2016. Teachme, A Gesture Recognition System with Customization Feature.

[37]

Radziwill, N. and Benton, M. 2017. Evaluating quality of chatbots and intelligent conversational agents. arXiv.

[38]

El Raheb, K. 2017. BalOnSe: Temporal aspects of dance movement and its ontological representation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017).

[39]

Ribeiro, C. 2016. 3D annotation in contemporary dance: Enhancing the creation-tool video annotator. ACM International Conference Proceeding Series (2016).

Digital Library

[40]

Rich, P.J. and Hannafin, M. 2009. Video Annotation Tools Technologies to Scaffold, Structure, and Transform Teacher Reflection. (2009).

[41]

Risko, E.F. 2013. The collaborative lecture annotation system (CLAS): A new TOOL for distributed learning. IEEE Transactions on Learning Technologies. 6, 1 (2013), 4–13.

Digital Library

[42]

Rodrigues, R. 2019. Multimodal Web Based Video Annotator with Real-Time Human Pose Estimation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 11872 LNCS, (2019), 23–30.

Digital Library

[43]

Singh, A.K. 2020. Voice Controlled Media Player: A Use Case to Demonstrate an On-premise Speech Command Recognition System. Communications in Computer and Information Science. 1209 CCIS, (2020), 186–197.

[44]

Singh, V. 2011. The choreographer's notebook-a video annotation system for dancers and choreographers. C and C 2011 - Proceedings of the 8th ACM Conference on Creativity and Cognition (2011).

[45]

Siri - Apple: 2021. https://www.apple.com/siri/. Accessed: 2021-01-24.

[46]

Smilkov, D. 2019. Tensorflow.JS: Machine learning for the web and beyond. arXiv.

[47]

DE SOUSA, L. 2017. The effect of multimedia use on the teaching and learning of Social Sciences at tertiary level: a case study. Yesterday and Today. 17 (2017), 1–22.

[48]

Srinivasan, A. and Stasko, J. 2018. Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks. IEEE Transactions on Visualization and Computer Graphics. 24, 1 (2018), 511–521.

[49]

Stevens, R. 2002. VideoTraces. (2002).

[50]

Tang, D. 2017. EFFECTIVENESS OF AUDIO-VISUAL AIDS IN TEACHING LOWER SECONDARY SCIENCE IN A RURAL SECONDARY SCHOOL. Asia Pacific Journal of Educators and Education. 32, (2017), 91–106.

[51]

Turk, M. 2014. Multimodal interaction: A review. Pattern Recognition Letters.

[52]

Vimeo: 2021. https://vimeo.com/features/video-collaboration. Accessed: 2021-06-05.

[53]

Wipster | Review Software: 2021. https://wipster.io/. Accessed: 2021-06-15.

[54]

Wittenburg, P. 2006. ELAN: A professional framework for multimodality research. Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006).

Cited By

Rodrigues RDiogo JJurgens SFernandes CCorreia N(2023)Sample-Based Human Movement Detection for Interactive Videos Applied to Performing ArtsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_32(567-587)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-42286-7_32
Diogo JRodrigues RMadeira RCorreia N(2022)Video Annotation Tool using Human Pose Estimation for Sports TrainingProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570592(262-265)Online publication date: 27-Nov-2022
https://dl.acm.org/doi/10.1145/3568444.3570592
Rodrigues R(2022)Interactive Intelligent Tools for Creative Processes using Multimodal InformationCompanion Proceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490100.3516479(134-137)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490100.3516479

Index Terms

Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous Environments

Index terms have been assigned to the content through auto-classification.

Recommendations

Usability of nomadic user interfaces
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III

During the last decade, a number of research activities have been performed to enable user interfaces and the underlying user activities to be migrated from one device to another. We call this "Nomadic User Interfaces". The primary goal of these ...
The MESH mobile video annotation tool
NordiCHI '08: Proceedings of the 5th Nordic conference on Human-computer interaction: building bridges

In this paper we describe the MESH Mobile Video Annotation tool, which provides a solution for a mobile user to make annotations of video files stored in his/her mobile device, using MESH ontology, in addition to free-text or manual annotations.

This ...
Towards effective video annotation: An approach to automatically link notes with video content

The characteristics of annotations, such as highlighting, context-based notes, and organization are difficult to translate from the traditional paper-based medium to the digital format. An added challenge is how to facilitate annotations on a digital ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MUM '21: Proceedings of the 20th International Conference on Mobile and Ubiquitous Multimedia

December 2021

263 pages

ISBN:9781450386432

DOI:10.1145/3490632

Editors:
Adalberto L. Simeone
KU Leuven
,
Raf Ramakers
Hasselt University
,
Cristina Gena
University of Turin

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MUM 2021

MUM 2021: 20th International Conference on Mobile and Ubiquitous Multimedia

December 5 - 8, 2021

Leuven, Belgium

Acceptance Rates

Overall Acceptance Rate 190 of 465 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
338
Total Downloads

Downloads (Last 12 months)222
Downloads (Last 6 weeks)31

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rodrigues RDiogo JJurgens SFernandes CCorreia N(2023)Sample-Based Human Movement Detection for Interactive Videos Applied to Performing ArtsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_32(567-587)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-42286-7_32
Diogo JRodrigues RMadeira RCorreia N(2022)Video Annotation Tool using Human Pose Estimation for Sports TrainingProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570592(262-265)Online publication date: 27-Nov-2022
https://dl.acm.org/doi/10.1145/3568444.3570592
Rodrigues R(2022)Interactive Intelligent Tools for Creative Processes using Multimodal InformationCompanion Proceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490100.3516479(134-137)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490100.3516479

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten