Skip to main content
Log in

Toward Generating Human-Centered Video Annotations

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In the past few decades, research has been carried out to automatically find humans in a video sequence. Automatically detecting humans in videos is gaining interest for numerous applications such as driver assistance system, security, people counting, human gait characterization, video annotations, retrieval, or crowd flow analysis. Manual annotation of a video is a time-consuming task that involves human annotators which varying biases. In this paper, we have presented three computer vision algorithms (contour-based, HOG-based and SURF-based) and proposed a deep learning technique that automatically extracts spatiotemporal annotations of human and represents it by a bounding box. We have performed experiments and the accuracy obtained for each method is 86%, 92.5%, 94%, and 95.5%, respectively. Results show that not only annotation accuracy has increased but the human effort has reduced with respect to manual annotations. We have also introduced a new dataset ASSVS_KICS which is captured through a high-quality stationary camera and contain scenarios based on our community for video surveillance research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. M. Akhlaq, T.R. Sheltami, B. Helgeson, E.M. Shakshuki, Designing an integrated driver assistance system using image sensors. J. Intell. Manuf. 23(6), 2109–2132 (2012)

    Article  Google Scholar 

  2. A. Alzughaibi, Z. Chaczko, Human detection model using feature extraction method in video frames, in IEEE International Conference on Image and Vision Computing New Zealand (IVCNZ) (2016), pp. 1–6

  3. H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust features, in Springer European Conference on Computer Vision (Berlin, Heidelberg, 2006), pp. 404–417

    Chapter  Google Scholar 

  4. R. Benenson, M. Omran, J. Hosang, B. Schiele, Ten years of pedestrian detection, what have we learned?, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 613–627

    Chapter  Google Scholar 

  5. M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in IEEE International Conference on Computer Vision (ICCV’05) (2005), pp. 1395–1402

  6. L. Cao, M. Dikmen, Y. Fu, T.S. Huang, Gender recognition from body, in Proceedings of the 16th ACM International Conference on Multimedia, ACM (2008), pp. 725–728

  7. D.Y. Chen, C.W. Su, Y.C. Zeng, S.W. Sun, W.R. Lai, H.Y.M Liao, An online people counting system for electronic advertising machines, in IEEE International Conference on Multimedia and Expo (ICME) (2009), pp. 1262–1265

  8. D.Y. Chen, P.C. Hsieh, Face-based gender recognition using compressive sensing, in International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), IEEE (2012), pp. 157–161

  9. D. Chowdhry, R. Paranjape, P. Laforge, Smart home automation system for intrusion detection, in IEEE 14th Canadian Workshop on Information Theory (CWIT) (2015), pp. 75–78

  10. R. Cutler, L.S. Davis, Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 781–796 (2000)

    Article  Google Scholar 

  11. N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1 (2005), pp. 886–893

  12. N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 428–441

    Chapter  Google Scholar 

  13. Y. Dedeoğlu, B.U. Töreyin, U. Güdükbay, A.E. Çetin, Silhouette-based method for object classification and human action recognition in video, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 64–77

    Chapter  Google Scholar 

  14. A. Dilawari, M.U.G. Khan, Natural language description of videos: corpus generation and analysis (paper in preparation)

  15. H.L. Eng, J. Wang, A.H. Kam, W.Y. Yau, A bayesian framework for robust human detection and occlusion handling human shape model, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 2 (2004), pp. 257–260

  16. R. Eshel, Y. Moses, Homography based multiple camera detection and tracking of people in a dense crowd, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), pp. 1–8

  17. L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2005), pp. 524–531

  18. D.M. Gavrila, J. Giebel, Shape-based pedestrian detection and tracking, in IEEE Intelligent Vehicle Symposium, vol. 1 (2002), pp. 8–14

  19. GRAZ01, http://www-old.emt.tugraz.at/~pinz/data/. Accessed 20 Dec 2018

  20. T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene using spatio-temporal motion analysis, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 4 (2004), pp. 331–334

  21. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in IEEE International Conference on Computer Vision (ICCV) (2017), pp. 2980–2988

  22. L. Hou, W. Wan, K. Han, R. Muhammad, M. Yang, Human detection and tracking over camera networks: a review, in IEEE International Conference on Audio, Language and Image Processing (ICALIP) (2016), pp. 574–580

  23. C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  24. W. Hu, T. Tan, L. Wang, S. Maybank, A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(3), 334–352 (2004)

    Article  Google Scholar 

  25. X. Hu, Y. Tang, Z. Zhang, Video object matching based on SIFT algorithm, in IEEE International Conference on Neural Networks and Signal Processing (2008), pp. 412–415

  26. K. Kale, S. Pawar, P. Dhulekar, Moving object tracking using optical flow and motion vector estimation, in IEEE 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions) (2015), pp. 1–6

  27. M.U.G. Khan, L. Zhang, Y. Gotoh, Human focused video description, in IEEE International Conference on Computer Vision Workshops (ICCV) (2011), pp. 1480–1487

  28. J. Klappstein, T. Vaudrey, C. Rabe, A. Wedel, R. Klette, Moving object segmentation using optical flow and depth information, in Pacific-Rim Symposium on Image and Video Technology (Springer, Berlin, 2009), pp. 611–623

    Chapter  Google Scholar 

  29. H. Kuehne, H. Jhuang, R. Stiefelhagen, T. Serre, Hmdb51: a large video database for human motion recognition, in High Performance Computing in Science and Engineering (Springer, Berlin, 2013), pp. 571–582

    Google Scholar 

  30. H.E. Lai, C.Y. Lin, M.K. Chen, L.W. Kang, C.H. Yeh, Moving objects detection based on hysteresis thresholding, in Advances in Intelligent Systems and Applications, vol. 2 (Springer, Berlin, 2013), pp. 289–298

    Chapter  Google Scholar 

  31. R. Li, S. Yu, X. Yang, Efficient spatio-temporal segmentation for extracting moving objects in video sequences. IEEE Trans. Consum. Electron. 53(3), 1161–1167 (2007)

    Article  Google Scholar 

  32. H.H. Lin, T.L. Liu, J.H. Chuang, Learning a scene background model via classification. IEEE Trans. Signal Process. 57(5), 1641–1654 (2009)

    Article  MathSciNet  Google Scholar 

  33. T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2 (2017), p. 4

  34. Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)

    Article  Google Scholar 

  35. Y. Liu, H. Ai, G.Y. Xu, Moving object detection and tracking based on background subtraction. Int. Soc. Opt. Photonics Object Detect. Classif. Track. Technol. 4554, 62–67 (2001)

    Google Scholar 

  36. Z. Lu, L. Wang, J.R. Wen, Image classification by visual bag-of-words refinement and reduction. Neurocomputing 173, 373–384 (2016)

    Article  Google Scholar 

  37. A. Mateus, D. Ribeiro, P. Miraldo, J.C. Nascimento, Efficient and robust pedestrian detection using deep learning for human-aware navigation. Robot. Auton. Syst. 113, 23–37 (2019)

    Article  Google Scholar 

  38. N.A. Ogale, A survey of techniques for human detection from video. Survey Univ. Md. 125(133), 19 (2006)

    Google Scholar 

  39. M. Paul, S.M. Haque, S. Chakraborty, Human detection in surveillance videos and its applications—a review. EURASIP J. Adv. Signal Process. 2013, 176 (2013)

    Article  Google Scholar 

  40. M. Radovic, O. Adarkwa, Q. Wang, Object recognition in aerial images using convolutional neural networks. J. Imaging 3(2), 21 (2017)

    Article  Google Scholar 

  41. H. Ramoser, T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Shape-based detection of humans for video surveillance applications, in IEEE International Conference on Image Processing (ICIP), vol. 3 (2003)

  42. Y. Ran, Q. Zheng, R. Chellappa, T.M. Strat, Applications of a simple characterization of human gait in surveillance. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 40(4), 1009–1020 (2010)

    Article  Google Scholar 

  43. K.K. Reddy, M. Shah, Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  44. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 6, 1137–1149 (2017)

    Article  Google Scholar 

  45. N. Sabri, Z. Ibrahim, M.M. Saad, N.N.A. Mangshor, N. Jamil, Human detection in video surveillance using texture features, in IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (2016), pp. 45–50

  46. E. Şaykol, U. Güdükbay, Ö. Ulusoy, A histogram-based approach for object-based query-by-shape-and-color in image and video databases. Image Vis. Comput. 23(13), 1170–1180 (2005)

    Article  Google Scholar 

  47. T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Performance evaluation metrics for motion detection and tracking, in Proceedings of the 17th International Conference on Pattern Recognition, (ICPR), vol. 4 (2004), pp. 519–522

  48. H. Sidenbladh, Detecting human motion with support vector machines, in IEEE Proceedings of the 17th International Conference on Pattern Recognition (ICPR) (British Machine Vision Association, Cambridge, England, 2004), pp. 188-191

  49. O.M. Sincan, V.B. Ajabshir, H.Y. Keles, S. Tosun, Moving object detection by a mounted moving camera, in IEEE International Conference on Computer as a Tool (EUROCON) (2015), pp. 1–6

  50. K. Soomro, A. R. Zamir, M. Shah, UCF101: A dataset of 101 human actions classes from videos in the Wild. arXiv preprint arXiv:1212.0402 (2012)

  51. N. Thome, S. Miguet, S. Ambellouis, A real-time, multi-view fall detection system: a LHMM-based approach. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1522–1532 (2008)

    Article  Google Scholar 

  52. F. Van der Heijden, Image Based Measurement Systems: Object Recognition and Parameter Estimation (Wiley, Hoboken, 1994)

    Google Scholar 

  53. R.C. Veltkamp, M. Hagedoorn, State of the Art in Shape Matching. Principles of Visual Information Retrieval (Springer, London, 2001), pp. 87–119

    Book  Google Scholar 

  54. P. Viola, M.J. Jones, D. Snow, Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis. 63(2), 153–161 (2005)

    Article  Google Scholar 

  55. C. Zhao, K. Chen, Z. Wei, Y. Chen, D. Miao, W. Wang, Multilevel triplet deep learning model for person re-identification. Pattern Recognit. Lett. 117, 161–168 (2019)

    Article  Google Scholar 

  56. D. Zhou, L. Wang, X. Cai, Y. Liu, Detection of moving targets with a moving camera, in IEEE International Conference on Robotics and Biomimetics (ROBIO) (2009), pp. 677–681

  57. H. Zhou, L. Xie, X. Fang, Visual mouse: sift detection and pca recognition, in IEEE International Conference on Computational Intelligence and Security Workshops (CISW) (2007), pp. 263–266

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zahoor ur Rehman, Irfan Mehmood or Seungmin Rho.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dilawari, A., Khan, M.U.G., ur Rehman, Z. et al. Toward Generating Human-Centered Video Annotations. Circuits Syst Signal Process 39, 857–883 (2020). https://doi.org/10.1007/s00034-019-01143-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01143-9

Keywords

Navigation