Abstract
People interact with their surroundings using several multimodal methods. Human-computer interaction is performed using these capabilities in order to provide, as much as possible, the most natural and productive experiences through speech, touch, vision, and gesture. The Web-based application used in this paper is a multi-platform video annotation tool that supports multimodal interaction. MotionNotes has the primary goal of fostering the creativity of both professional and amateur users. It is possible to interact with this tool using keyboard, touch, and voice, making it possible to add different types of annotations: voice, drawings, text, and marks. Furthermore, a feature of human poses identification in real-time was integrated into the annotation tool, enabling the identification of possible annotations. This paper presents and discusses results from a user study conducted to explore the user interaction with the tool, evaluating the prototype and its different interaction methods. User feedback shows that this approach to video annotation is stimulating and can enhance the user’s creativity and productivity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne, D., Kohlas, J. (eds.) Human Machine Interaction. LNCS, vol. 5440, pp. 3–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00437-7_1
Abuczki, Á., Esfandiari Baiat, G.: An overview of multimodal corpora, annotation tools and schemes. Argumentu 9, 86–98 (2013)
CultureMoves: Culture Moves. https://culturemoves.eu/. Accessed 17 Jun 2021
Europeana: Europeana. www.europeana.eu. Accessed 16 May 2021
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)
Cabral, D., Valente, J., Silva, J., Aragão, U., Fernandes, C., Correia, N.: A creation-tool for contemporary dance using multimodal video annotation. In: Proceedings of the 2011 ACM Multimedia Conference and Workshops, MM 2011 (2011)
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: a professional framework for multimodality research. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006)
Goldman, D.B., Gonterman, C., Curless, B., Salesin, D., Seitz, S.M.: Video object annotation, navigation, and composition. In: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, UIST 2008 (2008)
Singh, V., Latulipe, C., Carroll, E., Lottridge, D.: The choreographer’s notebook-a video annotation system for dancers and choreographers. In: Proceedings of the 8th ACM Conference on Creativity and Cognition, C and C 2011 (2011)
El Raheb, K., Kasomoulis, A., Katifori, A., Rezkalla, M., Ioannidis, Y.: A web-based system for annotation of dance multimodal recordings by dance practitioners and experts. In: ACM International Conference Proceeding Series (2018)
Cabral, D., Valente, J.G., Aragão, U., Fernandes, C., Correia, N.: Evaluation of a multimodal video annotator for contemporary dance. In: Proceedings of the Workshop on Advanced Visual Interfaces AVI (2012)
Cabral, D., Correia, N.: Video editing with pen-based technology. Multimedia Tools Appl. 76(5), 6889–6914 (2016)
Silva, J., Fernandes, C., Cabral, D., Correia, N.: Real-time annotation of video objects on tablet computers. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, MUM 2012 (2012)
Ribeiro, C., Kuffner, R., Fernandes, C., Pereira, J.: 3D annotation in contemporary dance: Enhancing the creation-tool video annotator. In: ACM International Conference Proceeding Series (2016)
Wipster | Review Software. https://wipster.io/. Accessed 15 Jun 2021
Camtasia. https://www.techsmith.com/video-editor.html. Accessed 2 Jun 2021
Frame.io. https://www.frame.io/. Accessed 25 May 2021
Vimeo. https://vimeo.com/features/video-collaboration. Accessed 5 Jun 2021
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008). https://doi.org/10.1109/CVPR.2008.4587597
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation, pp. 1014–1021 (2010). https://doi.org/10.1109/CVPR.2009.5206754
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011). https://doi.org/10.1109/CVPR.2011.5995741
Markoff, J.: Scientists See Promise in Deep-Learning Program. Nyt. (2012)
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks (2014)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5686–5696 (2019)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Rodrigues, R., Madeira, R.N., Correia, N., Fernandes, C., Ribeiro, S.: Multimodal web based video annotator with real-time human pose estimation. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 23–30. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_3
PoseNet. https://learn.ml5js.org/#/reference/posenet?id=posenet. Accessed 15 Nov 2021
Acknowledgements
This work is funded by Fundação para a Ciência e Tecnologia through a Ph.D. Studentship grant (2020.09417.BD). It is Supported by the project CultureMoves, Grant Agreement Number: INEA/CEF/ICT/A2017/1568369. It is also supported by NOVA LINCS RC, partially funded by project UID/CEC/04516/2020 granted by FCT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Rodrigues, R., Madeira, R.N., Correia, N. (2022). Exploring the User Interaction with a Multimodal Web-Based Video Annotator. In: Lv, Z., Song, H. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-99188-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-99188-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99187-6
Online ISBN: 978-3-030-99188-3
eBook Packages: Computer ScienceComputer Science (R0)