Skip to main content

Exploring the User Interaction with a Multimodal Web-Based Video Annotator

  • Conference paper
  • First Online:
Intelligent Technologies for Interactive Entertainment (INTETAIN 2021)

Abstract

People interact with their surroundings using several multimodal methods. Human-computer interaction is performed using these capabilities in order to provide, as much as possible, the most natural and productive experiences through speech, touch, vision, and gesture. The Web-based application used in this paper is a multi-platform video annotation tool that supports multimodal interaction. MotionNotes has the primary goal of fostering the creativity of both professional and amateur users. It is possible to interact with this tool using keyboard, touch, and voice, making it possible to add different types of annotations: voice, drawings, text, and marks. Furthermore, a feature of human poses identification in real-time was integrated into the annotation tool, enabling the identification of possible annotations. This paper presents and discusses results from a user study conducted to explore the user interaction with the tool, evaluating the prototype and its different interaction methods. User feedback shows that this approach to video annotation is stimulating and can enhance the user’s creativity and productivity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)

    Article  Google Scholar 

  2. Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne, D., Kohlas, J. (eds.) Human Machine Interaction. LNCS, vol. 5440, pp. 3–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00437-7_1

    Chapter  Google Scholar 

  3. Abuczki, Á., Esfandiari Baiat, G.: An overview of multimodal corpora, annotation tools and schemes. Argumentu 9, 86–98 (2013)

    Google Scholar 

  4. CultureMoves: Culture Moves. https://culturemoves.eu/. Accessed 17 Jun 2021

  5. Europeana: Europeana. www.europeana.eu. Accessed 16 May 2021

  6. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)

    Google Scholar 

  7. Cabral, D., Valente, J., Silva, J., Aragão, U., Fernandes, C., Correia, N.: A creation-tool for contemporary dance using multimodal video annotation. In: Proceedings of the 2011 ACM Multimedia Conference and Workshops, MM 2011 (2011)

    Google Scholar 

  8. Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: a professional framework for multimodality research. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006)

    Google Scholar 

  9. Goldman, D.B., Gonterman, C., Curless, B., Salesin, D., Seitz, S.M.: Video object annotation, navigation, and composition. In: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, UIST 2008 (2008)

    Google Scholar 

  10. Singh, V., Latulipe, C., Carroll, E., Lottridge, D.: The choreographer’s notebook-a video annotation system for dancers and choreographers. In: Proceedings of the 8th ACM Conference on Creativity and Cognition, C and C 2011 (2011)

    Google Scholar 

  11. El Raheb, K., Kasomoulis, A., Katifori, A., Rezkalla, M., Ioannidis, Y.: A web-based system for annotation of dance multimodal recordings by dance practitioners and experts. In: ACM International Conference Proceeding Series (2018)

    Google Scholar 

  12. Cabral, D., Valente, J.G., Aragão, U., Fernandes, C., Correia, N.: Evaluation of a multimodal video annotator for contemporary dance. In: Proceedings of the Workshop on Advanced Visual Interfaces AVI (2012)

    Google Scholar 

  13. Cabral, D., Correia, N.: Video editing with pen-based technology. Multimedia Tools Appl. 76(5), 6889–6914 (2016)

    Article  Google Scholar 

  14. Silva, J., Fernandes, C., Cabral, D., Correia, N.: Real-time annotation of video objects on tablet computers. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, MUM 2012 (2012)

    Google Scholar 

  15. Ribeiro, C., Kuffner, R., Fernandes, C., Pereira, J.: 3D annotation in contemporary dance: Enhancing the creation-tool video annotator. In: ACM International Conference Proceeding Series (2016)

    Google Scholar 

  16. Wipster | Review Software. https://wipster.io/. Accessed 15 Jun 2021

  17. Camtasia. https://www.techsmith.com/video-editor.html. Accessed 2 Jun 2021

  18. Frame.io. https://www.frame.io/. Accessed 25 May 2021

  19. Vimeo. https://vimeo.com/features/video-collaboration. Accessed 5 Jun 2021

  20. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008). https://doi.org/10.1109/CVPR.2008.4587597

  21. Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation, pp. 1014–1021 (2010). https://doi.org/10.1109/CVPR.2009.5206754

  22. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011). https://doi.org/10.1109/CVPR.2011.5995741

  23. Markoff, J.: Scientists See Promise in Deep-Learning Program. Nyt. (2012)

    Google Scholar 

  24. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks (2014)

    Google Scholar 

  25. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3

    Chapter  Google Scholar 

  26. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5686–5696 (2019)

    Google Scholar 

  27. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

    Google Scholar 

  28. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  29. Rodrigues, R., Madeira, R.N., Correia, N., Fernandes, C., Ribeiro, S.: Multimodal web based video annotator with real-time human pose estimation. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 23–30. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_3

    Chapter  Google Scholar 

  30. PoseNet. https://learn.ml5js.org/#/reference/posenet?id=posenet. Accessed 15 Nov 2021

Download references

Acknowledgements

This work is funded by Fundação para a Ciência e Tecnologia through a Ph.D. Studentship grant (2020.09417.BD). It is Supported by the project CultureMoves, Grant Agreement Number: INEA/CEF/ICT/A2017/1568369. It is also supported by NOVA LINCS RC, partially funded by project UID/CEC/04516/2020 granted by FCT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodrigues, R., Madeira, R.N., Correia, N. (2022). Exploring the User Interaction with a Multimodal Web-Based Video Annotator. In: Lv, Z., Song, H. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-99188-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99188-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99187-6

  • Online ISBN: 978-3-030-99188-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics