skip to main content
10.1145/3490632.3490672acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmumConference Proceedingsconference-collections
research-article
Open access

Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous Environments

Published: 25 February 2022 Publication History

Abstract

Creativity and inspiration for problem-solving are critical skills in a group-based learning environment. Communication procedures have seen continuous adjustments over the years, with increased multimedia elements usage like videos to provide superior audience impact. Annotations are a valuable approach for remembering, reflecting, reasoning, and sharing thoughts on the learning process. However, it is hard to control playback flow and add potential notes during video presentations, such as in a classroom context. Teachers often need to move around the classroom to interact with the students, which leads to situations where they are physically far from the computer. Therefore, we developed a multimodal web video annotation tool that combines a voice interaction module with manual annotation capabilities for more intelligent natural interactions towards ubiquitous environments. We observed current video annotation practices and created a new set of principles to guide our research work. Natural language enables users to express their intended actions while interacting with the web video player for annotation purposes. We have developed a customized set of natural language expressions that map the user speech to specific software operations through studying and integrating new artificial intelligence techniques. Finally, the paper presents positive results gathered from a user study conducted to evaluate our solution.

References

[1]
A. Qaffas, A. 2019. Improvement of Chatbots Semantics Using Wit.ai and Word Sequence Kernel: Education Chatbot as a Case Study. International Journal of Modern Education and Computer Science. (2019).
[2]
Amazon Alexa Voice Assistant | Alexa Developer Official Site: 2021. https://developer.amazon.com/en-US/alexa. Accessed: 2021-01-24.
[3]
Augstein, M. 2019. WeldVUI: Establishing Speech-Based Interfaces in Industrial Applications. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019).
[4]
Bellegarda, J.R. 2014. Spoken Language Understanding for Natural Interaction: The Siri Experience. Natural Interaction with Robots, Knowbots and Smartphones.
[5]
Cabral, D. 2011. A creation-tool for contemporary dance using multimodal video annotation. MM’11 - Proceedings of the 2011 ACM Multimedia Conference and Workshops (2011).
[6]
Cabral, D. and Correia, N. 2017. Video editing with pen-based technology. Multimedia Tools and Applications. (2017).
[7]
Cambria, E. and White, B. 2014. Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine.
[8]
Camtasia: 2001. https://www.techsmith.com/video-editor.html. Accessed: 2021-06-02.
[9]
Carney, M. 2020. Teachable machine: Approachable web-based tool for exploring machine learning classification. Conference on Human Factors in Computing Systems - Proceedings (2020).
[10]
Chang, M. 2019. How to design voice based navigation for how-to videos. Conference on Human Factors in Computing Systems - Proceedings (2019).
[11]
Christina, C. 2017. Powerpoint Controller using Speech Recognition.
[12]
Cohen, P.R. and Oviatt, S. 2017. Multimodal speech and pen interfaces. The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1. (2017), 403–447.
[13]
Colasante, M. and Douglas, K. 2016. Prepare-participate-connect: Active learning with video annotation. Australasian Journal of Educational Technology. 32, 4 (Nov. 2016), 68–91.
[14]
Cortana - Your personal productivity assistant: 2021. https://www.microsoft.com/en-us/cortana. Accessed: 2021-01-24.
[15]
Frame.io: 2021. https://www.frame.io/. Accessed: 2021-05-25.
[16]
Fraser, C.A. 2020. ReMap: Lowering the barrier to help-seeking with multimodal search. UIST 2020 - Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (2020).
[17]
Furui, S. 2005. 50 Years of Progress in Speech and Speaker Recognition Research. ECTI Transactions on Computer and Information Technology (ECTI-CIT). 1, 2 (Jan. 2005), 64–74.
[18]
Gao, T. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. UIST 2015 - Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (2015).
[19]
Gašević, D. 2014. Analytics of the effects of video use and instruction to support reflective learning. ACM International Conference Proceeding Series (2014), 123–132.
[20]
Gerard, C. and Gerard, C. 2021. TensorFlow.js. Practical Machine Learning in JavaScript.
[21]
Goldman, R. 2014. Video Research in the Learning Sciences. Routledge.
[22]
Juang, B.H. and Rabiner, L.R. 2004. Automatic Speech Recognition – A Brief History of the Technology Development. Elsevier Encyclopedia of Language and Linguistics. (2004).
[23]
Kang, R. 2019. Minuet: Multimodal interaction with an internet of things. Proceedings - SUI 2019: ACM Conference on Spatial User Interaction (2019).
[24]
Kipp, M. 2012. Multimedia Annotation, Querying, and Analysis in Anvil. Multimedia Information Extraction: Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance, and Authoring. (Aug. 2012), 351–367.
[25]
Laput, G. 2013. PixelTone: A multimodal interface for image editing. Conference on Human Factors in Computing Systems - Proceedings (2013).
[26]
Lemon, N. 2013. Video annotation for collaborative connections to learning: Case studies from an Australian higher education context. Cutting-Edge Technologies in Higher Education. 6, PARTF (2013), 181–214.
[27]
López, G. 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. Advances in Intelligent Systems and Computing (2018).
[28]
McShane, M. 2017. Natural language understanding (NLU, not NLP) in cognitive systems. AI Magazine. (2017).
[29]
Mehler, B. 2016. Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems. Ergonomics. (2016).
[30]
Mitrevski, M. and Mitrevski, M. 2018. Getting Started with Wit.ai. Developing Conversational Interfaces for iOS.
[31]
ml5js·Friendly Machine Learning For The Web: 2021. https://ml5js.org/. Accessed: 2021-01-25.
[32]
MotionNotes: 2019. https://motion-notes.di.fct.unl.pt/. Accessed: 2020-05-16.
[33]
Mouza, C. and Lavigne, N.C. 2013. Introduction to Emerging Technologies for the Classroom: A Learning Sciences Perspective. Emerging Technologies for the Classroom: A Learning Sciences Perspective. (Jan. 2013), 1–12.
[34]
Oviatt, S. and Cohen, P. 2000. Perceptual user interfaces: multimodal interfaces that process what comes naturally. Communications of the ACM. (2000).
[35]
Pardo, A. 2015. Identifying learning strategies associated with active use of video annotation software. ACM International Conference Proceeding Series (Mar. 2015), 255–259.
[36]
Qattous, H. 2016. Teachme, A Gesture Recognition System with Customization Feature.
[37]
Radziwill, N. and Benton, M. 2017. Evaluating quality of chatbots and intelligent conversational agents. arXiv.
[38]
El Raheb, K. 2017. BalOnSe: Temporal aspects of dance movement and its ontological representation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017).
[39]
Ribeiro, C. 2016. 3D annotation in contemporary dance: Enhancing the creation-tool video annotator. ACM International Conference Proceeding Series (2016).
[40]
Rich, P.J. and Hannafin, M. 2009. Video Annotation Tools Technologies to Scaffold, Structure, and Transform Teacher Reflection. (2009).
[41]
Risko, E.F. 2013. The collaborative lecture annotation system (CLAS): A new TOOL for distributed learning. IEEE Transactions on Learning Technologies. 6, 1 (2013), 4–13.
[42]
Rodrigues, R. 2019. Multimodal Web Based Video Annotator with Real-Time Human Pose Estimation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 11872 LNCS, (2019), 23–30.
[43]
Singh, A.K. 2020. Voice Controlled Media Player: A Use Case to Demonstrate an On-premise Speech Command Recognition System. Communications in Computer and Information Science. 1209 CCIS, (2020), 186–197.
[44]
Singh, V. 2011. The choreographer's notebook-a video annotation system for dancers and choreographers. C and C 2011 - Proceedings of the 8th ACM Conference on Creativity and Cognition (2011).
[45]
Siri - Apple: 2021. https://www.apple.com/siri/. Accessed: 2021-01-24.
[46]
Smilkov, D. 2019. Tensorflow.JS: Machine learning for the web and beyond. arXiv.
[47]
DE SOUSA, L. 2017. The effect of multimedia use on the teaching and learning of Social Sciences at tertiary level: a case study. Yesterday and Today. 17 (2017), 1–22.
[48]
Srinivasan, A. and Stasko, J. 2018. Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks. IEEE Transactions on Visualization and Computer Graphics. 24, 1 (2018), 511–521.
[49]
Stevens, R. 2002. VideoTraces. (2002).
[50]
Tang, D. 2017. EFFECTIVENESS OF AUDIO-VISUAL AIDS IN TEACHING LOWER SECONDARY SCIENCE IN A RURAL SECONDARY SCHOOL. Asia Pacific Journal of Educators and Education. 32, (2017), 91–106.
[51]
Turk, M. 2014. Multimodal interaction: A review. Pattern Recognition Letters.
[52]
Vimeo: 2021. https://vimeo.com/features/video-collaboration. Accessed: 2021-06-05.
[53]
Wipster | Review Software: 2021. https://wipster.io/. Accessed: 2021-06-15.
[54]
Wittenburg, P. 2006. ELAN: A professional framework for multimodality research. Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006).

Cited By

View all
  • (2023)Sample-Based Human Movement Detection for Interactive Videos Applied to Performing ArtsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_32(567-587)Online publication date: 28-Aug-2023
  • (2022)Video Annotation Tool using Human Pose Estimation for Sports TrainingProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570592(262-265)Online publication date: 27-Nov-2022
  • (2022)Interactive Intelligent Tools for Creative Processes using Multimodal InformationCompanion Proceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490100.3516479(134-137)Online publication date: 22-Mar-2022

Index Terms

  1. Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous Environments
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            MUM '21: Proceedings of the 20th International Conference on Mobile and Ubiquitous Multimedia
            December 2021
            263 pages
            ISBN:9781450386432
            DOI:10.1145/3490632
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 25 February 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. AI-Based Tools
            2. HCI in Ubiquitous Environments
            3. Multimodal Interfaces
            4. Natural Language Processing
            5. Speech Interfaces
            6. Video Annotation

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            MUM 2021

            Acceptance Rates

            Overall Acceptance Rate 190 of 465 submissions, 41%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)222
            • Downloads (Last 6 weeks)31
            Reflects downloads up to 27 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2023)Sample-Based Human Movement Detection for Interactive Videos Applied to Performing ArtsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_32(567-587)Online publication date: 28-Aug-2023
            • (2022)Video Annotation Tool using Human Pose Estimation for Sports TrainingProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3570592(262-265)Online publication date: 27-Nov-2022
            • (2022)Interactive Intelligent Tools for Creative Processes using Multimodal InformationCompanion Proceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490100.3516479(134-137)Online publication date: 22-Mar-2022

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Login options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media