Exploring the User Interaction with a Multimodal Web-Based Video Annotator

Rodrigues, Rui; Madeira, Rui Neves; Correia, Nuno

doi:10.1007/978-3-030-99188-3_2

Rui Rodrigues^17,18,
Rui Neves Madeira^17,18 &
Nuno Correia¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 429))

Included in the following conference series:

International Conference on Intelligent Technologies for Interactive Entertainment

961 Accesses
1 Citations

Abstract

People interact with their surroundings using several multimodal methods. Human-computer interaction is performed using these capabilities in order to provide, as much as possible, the most natural and productive experiences through speech, touch, vision, and gesture. The Web-based application used in this paper is a multi-platform video annotation tool that supports multimodal interaction. MotionNotes has the primary goal of fostering the creativity of both professional and amateur users. It is possible to interact with this tool using keyboard, touch, and voice, making it possible to add different types of annotations: voice, drawings, text, and marks. Furthermore, a feature of human poses identification in real-time was integrated into the annotation tool, enabling the identification of possible annotations. This paper presents and discusses results from a user study conducted to explore the user interaction with the tool, evaluating the prototype and its different interaction methods. User feedback shows that this approach to video annotation is stimulating and can enhance the user’s creativity and productivity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal Web Based Video Annotator with Real-Time Human Pose Estimation

Tools for Multimodal Annotation

Sample-Based Human Movement Detection for Interactive Videos Applied to Performing Arts

References

Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne, D., Kohlas, J. (eds.) Human Machine Interaction. LNCS, vol. 5440, pp. 3–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00437-7_1
Chapter Google Scholar
Abuczki, Á., Esfandiari Baiat, G.: An overview of multimodal corpora, annotation tools and schemes. Argumentu 9, 86–98 (2013)
Google Scholar
CultureMoves: Culture Moves. https://culturemoves.eu/. Accessed 17 Jun 2021
Europeana: Europeana. www.europeana.eu. Accessed 16 May 2021
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)
Google Scholar
Cabral, D., Valente, J., Silva, J., Aragão, U., Fernandes, C., Correia, N.: A creation-tool for contemporary dance using multimodal video annotation. In: Proceedings of the 2011 ACM Multimedia Conference and Workshops, MM 2011 (2011)
Google Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: a professional framework for multimodality research. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006 (2006)
Google Scholar
Goldman, D.B., Gonterman, C., Curless, B., Salesin, D., Seitz, S.M.: Video object annotation, navigation, and composition. In: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, UIST 2008 (2008)
Google Scholar
Singh, V., Latulipe, C., Carroll, E., Lottridge, D.: The choreographer’s notebook-a video annotation system for dancers and choreographers. In: Proceedings of the 8th ACM Conference on Creativity and Cognition, C and C 2011 (2011)
Google Scholar
El Raheb, K., Kasomoulis, A., Katifori, A., Rezkalla, M., Ioannidis, Y.: A web-based system for annotation of dance multimodal recordings by dance practitioners and experts. In: ACM International Conference Proceeding Series (2018)
Google Scholar
Cabral, D., Valente, J.G., Aragão, U., Fernandes, C., Correia, N.: Evaluation of a multimodal video annotator for contemporary dance. In: Proceedings of the Workshop on Advanced Visual Interfaces AVI (2012)
Google Scholar
Cabral, D., Correia, N.: Video editing with pen-based technology. Multimedia Tools Appl. 76(5), 6889–6914 (2016)
Article Google Scholar
Silva, J., Fernandes, C., Cabral, D., Correia, N.: Real-time annotation of video objects on tablet computers. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, MUM 2012 (2012)
Google Scholar
Ribeiro, C., Kuffner, R., Fernandes, C., Pereira, J.: 3D annotation in contemporary dance: Enhancing the creation-tool video annotator. In: ACM International Conference Proceeding Series (2016)
Google Scholar
Wipster | Review Software. https://wipster.io/. Accessed 15 Jun 2021
Camtasia. https://www.techsmith.com/video-editor.html. Accessed 2 Jun 2021
Frame.io. https://www.frame.io/. Accessed 25 May 2021
Vimeo. https://vimeo.com/features/video-collaboration. Accessed 5 Jun 2021
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008). https://doi.org/10.1109/CVPR.2008.4587597
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation, pp. 1014–1021 (2010). https://doi.org/10.1109/CVPR.2009.5206754
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011). https://doi.org/10.1109/CVPR.2011.5995741
Markoff, J.: Scientists See Promise in Deep-Learning Program. Nyt. (2012)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks (2014)
Google Scholar
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Chapter Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5686–5696 (2019)
Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Rodrigues, R., Madeira, R.N., Correia, N., Fernandes, C., Ribeiro, S.: Multimodal web based video annotator with real-time human pose estimation. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 23–30. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_3
Chapter Google Scholar
PoseNet. https://learn.ml5js.org/#/reference/posenet?id=posenet. Accessed 15 Nov 2021

Download references

Acknowledgements

This work is funded by Fundação para a Ciência e Tecnologia through a Ph.D. Studentship grant (2020.09417.BD). It is Supported by the project CultureMoves, Grant Agreement Number: INEA/CEF/ICT/A2017/1568369. It is also supported by NOVA LINCS RC, partially funded by project UID/CEC/04516/2020 granted by FCT.

Author information

Authors and Affiliations

NOVA LINCS, NOVA School of Science and Technology, NOVA University Lisbon, Lisbon, Portugal
Rui Rodrigues, Rui Neves Madeira & Nuno Correia
Sustain.RD, Setúbal School of Technology, Polytechnic Institute of Setúbal, Setúbal, Portugal
Rui Rodrigues & Rui Neves Madeira

Authors

Rui Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Rui Neves Madeira
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Correia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Rodrigues .

Editor information

Editors and Affiliations

Uppsala University, Uppsala, Sweden
Zhihan Lv
Embry-Riddle Aeronautical University, Daytona Beach, WV, USA
Houbing Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodrigues, R., Madeira, R.N., Correia, N. (2022). Exploring the User Interaction with a Multimodal Web-Based Video Annotator. In: Lv, Z., Song, H. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-99188-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-99188-3_2
Published: 25 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99187-6
Online ISBN: 978-3-030-99188-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring the User Interaction with a Multimodal Web-Based Video Annotator

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Web Based Video Annotator with Real-Time Human Pose Estimation

Tools for Multimodal Annotation

Sample-Based Human Movement Detection for Interactive Videos Applied to Performing Arts

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploring the User Interaction with a Multimodal Web-Based Video Annotator

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Web Based Video Annotator with Real-Time Human Pose Estimation

Tools for Multimodal Annotation

Sample-Based Human Movement Detection for Interactive Videos Applied to Performing Arts

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation