Skip to main content
Log in

A methodology for image annotation of human actions in videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it’s semantics. Though, several methods have been introduced to adopt the image annotation such as manual and semi-supervised. However, formal specification, high cost, high probability of errors and computation time remain major issues to perform image annotation. In order to overcome these issues, we propose a new image annotation technique which consists of three tiers namely frames extraction, interest point’s generation, and clustering. The aim of the proposed technique is to automate the label generation of video frames. Moreover, an evaluation model to assess the effectiveness of the proposed technique is used. The promising results of the proposed technique indicate the effectiveness (77% in terms of Adjusted Random Index) of the proposed technique in the context label generation for video frames. In the end, a comparative study analysis is made between the existing techniques and proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bi Y, Zhang M, Xue B (2018) Genetic programming for automatic global and local feature extraction to image classification. IEEE Congress on Evolutionary Computation (CEC). https://doi.org/10.1109/cec.2018.8477911

  2. Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054

    Article  Google Scholar 

  3. Chap3.htm. http://www.nada.kth.se/cvap/actions/. Accessed 11 Jan 2019

  4. Chen Y, Gao H, Cai L, Shi M, Shen D, Ji S (2018) Voxel deconvolutional networks for 3D brain image labeling. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD 18, 2018. https://doi.org/10.1145/3219819.3219974

  5. Donges N (2018) Pros and cons of neural networks – towards data science. Towards Data Science April 17, 2018. https://towardsdatascience.com/hype-disadvantages-of-neural-networks-6af04904ba5b. Accessed 27 Dec 2018

  6. Gerum RC, Richter S, Fabry B, Zitterbart DP (2016) ClickPoints: an expandable toolbox for scientific image annotation and analysis. Methods Ecol Evol 8(6):750–756. https://doi.org/10.1111/2041-210x.12702

    Article  Google Scholar 

  7. Hernández-García R, Ramos-Cózar J, Guil N, García-Reyes E, Sahli H (2018) Improving bag-of-visual-words model using visual N-grams for human action classification. Expert Syst Appl 92:182–191. https://doi.org/10.1016/j.eswa.2017.09.016

    Article  Google Scholar 

  8. Kapildalwani Using Silhouette Analysis for Selecting the Number of Cluster for K-means Clustering. (Part 2). Data Science Musing of Kapild. December 08, 2015. https://kapilddatascience.wordpress.com/2015/11/10/using-silhouette-analysis-for-selecting-the-number-of-cluster-for-k-means-clustering/. Accessed 5 Jan 2019

  9. Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2013) An innovative web-based collaborative platform for video annotation. Multimed Tools Appl 70(1):413–432. https://doi.org/10.1007/s11042-013-1419-7

    Article  Google Scholar 

  10. Lonarkar V, Rao BA (2017) Content-based Image Retrieval by Segmentation and Clustering. International Conference on Inventive Computing and Informatics (ICICI), 2017. https://doi.org/10.1109/icici.2017.8365241

  11. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/b:visi.0000029664.99615.94

    Article  Google Scholar 

  12. Luo S, Yang H, Cheng W, Che X, Meinel C (2016) Real-time action recognition in surveillance videos using ConvNets. Neural Information Processing Lecture Notes in Computer Science:529–537. https://doi.org/10.1007/978-3-319-46675-0_58

  13. Luo S, Yang H, Wang C, Che X, Meinel C (2016) Action recognition in surveillance video using ConvNets and motion history image. Artificial Neural Networks and Machine Learning – ICANN 2016 Lecture Notes in Computer Science, 187–195. https://doi.org/10.1007/978-3-319-44781-0_23.

  14. Nemirovskiy VB, Stoyanov AK (2017) Clustering face images. Comput Opt 41(1):59–66. https://doi.org/10.18287/2412-6179-2017-41-1-59-66

    Article  Google Scholar 

  15. Pagare R, Shinde A (2012) A study on image annotation techniques. Int J Comput Appl 37(6):42–45. https://doi.org/10.5120/4616-6295

    Article  Google Scholar 

  16. Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039

    Article  Google Scholar 

  17. Peikari M, Salama S, Nofech-Mozes S, Martel AL (2018) A cluster-then-label semi-supervised learning approach for pathology image classifica-tion. Sci Rep 8(1). https://doi.org/10.1038/s41598-018-24876-0

  18. Peng X, Zhao B, Yan R, Tang H, Yi Z (2017) Bag of events: an efficient probability-based feature extraction method for AER image sensors. IEEE Transactions on Neural Networks and Learning Systems 28(4):791–803. https://doi.org/10.1109/tnnls.2016.2536741

    Article  Google Scholar 

  19. Rezaee MJ, Jozmaleki M, Valipour M (2018) Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange. Physica A: Statistical Mechanics and Its Applications 489:78–93. https://doi.org/10.1016/j.physa.2017.07.017

    Article  Google Scholar 

  20. Sarwas G, Skoneczny S (2018) FSIFT based feature points for face hierarchical clustering. 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). https://doi.org/10.23919/spa.2018.8563400

  21. Sharma D, Thulasiraman K, Wu D, Jiang JN (2017) Power network equivalents: a network science based K-means clustering method integrated with Silhouette analysis. Studies in Computational Intelligence Complex Networks & Their Applications VI:78–89. https://doi.org/10.1007/978-3-319-72150-7_7

  22. Shi Z (2010) Image semantic analysis and understanding. Intelligent Information Processing V IFIP Advances in Information and Communication Technology, pp 4–5. https://doi.org/10.1007/978-3-642-16327-2_4

  23. Space-Time Video Completion. http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 11 Jan 2019

  24. Steinley D, Brusco MJ (2017) A note on the expected value of the Rand Index. Brit J Math Stat Psychol 71(2):287–299. https://doi.org/10.1111/bmsp.12116

    Article  MATH  Google Scholar 

  25. Tran D, Sorokin A (2008) Human Activity Recognition with Metric Learn-ing. Lecture Notes in Computer Science Computer Vision – ECCV 2008, 548–561. https://doi.org/10.1007/978-3-540-88682-242

  26. UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_Sports_Action.php. Accessed 11 Jan 2019

  27. UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_YouTube_Action.php. Accessed 11 Jan 2019

  28. Ukita N, Uematsu Y (2018) Semi- and weakly-supervised human pose estimation. Comput Vis Image Underst 170:67–78. https://doi.org/10.1016/j.cviu.2018.02.003

    Article  Google Scholar 

  29. Wagner J, Baur T, Zhang Y, Valstar M. F, Schuller B, André E (2018) Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora. ArXiv.org. February 07, 2018. https://arxiv.org/abs/1802.02565. Accessed 29 Mar 2019

  30. Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. 2016 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2016.7727435

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahid Hussain.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Waheed, M., Hussain, S., Khan, A.A. et al. A methodology for image annotation of human actions in videos. Multimed Tools Appl 79, 24347–24365 (2020). https://doi.org/10.1007/s11042-020-09091-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09091-2

Keywords

Navigation