Abstract
In the past few decades, research has been carried out to automatically find humans in a video sequence. Automatically detecting humans in videos is gaining interest for numerous applications such as driver assistance system, security, people counting, human gait characterization, video annotations, retrieval, or crowd flow analysis. Manual annotation of a video is a time-consuming task that involves human annotators which varying biases. In this paper, we have presented three computer vision algorithms (contour-based, HOG-based and SURF-based) and proposed a deep learning technique that automatically extracts spatiotemporal annotations of human and represents it by a bounding box. We have performed experiments and the accuracy obtained for each method is 86%, 92.5%, 94%, and 95.5%, respectively. Results show that not only annotation accuracy has increased but the human effort has reduced with respect to manual annotations. We have also introduced a new dataset ASSVS_KICS which is captured through a high-quality stationary camera and contain scenarios based on our community for video surveillance research.
Similar content being viewed by others
References
M. Akhlaq, T.R. Sheltami, B. Helgeson, E.M. Shakshuki, Designing an integrated driver assistance system using image sensors. J. Intell. Manuf. 23(6), 2109–2132 (2012)
A. Alzughaibi, Z. Chaczko, Human detection model using feature extraction method in video frames, in IEEE International Conference on Image and Vision Computing New Zealand (IVCNZ) (2016), pp. 1–6
H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust features, in Springer European Conference on Computer Vision (Berlin, Heidelberg, 2006), pp. 404–417
R. Benenson, M. Omran, J. Hosang, B. Schiele, Ten years of pedestrian detection, what have we learned?, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 613–627
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in IEEE International Conference on Computer Vision (ICCV’05) (2005), pp. 1395–1402
L. Cao, M. Dikmen, Y. Fu, T.S. Huang, Gender recognition from body, in Proceedings of the 16th ACM International Conference on Multimedia, ACM (2008), pp. 725–728
D.Y. Chen, C.W. Su, Y.C. Zeng, S.W. Sun, W.R. Lai, H.Y.M Liao, An online people counting system for electronic advertising machines, in IEEE International Conference on Multimedia and Expo (ICME) (2009), pp. 1262–1265
D.Y. Chen, P.C. Hsieh, Face-based gender recognition using compressive sensing, in International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), IEEE (2012), pp. 157–161
D. Chowdhry, R. Paranjape, P. Laforge, Smart home automation system for intrusion detection, in IEEE 14th Canadian Workshop on Information Theory (CWIT) (2015), pp. 75–78
R. Cutler, L.S. Davis, Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 781–796 (2000)
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1 (2005), pp. 886–893
N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 428–441
Y. Dedeoğlu, B.U. Töreyin, U. Güdükbay, A.E. Çetin, Silhouette-based method for object classification and human action recognition in video, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 64–77
A. Dilawari, M.U.G. Khan, Natural language description of videos: corpus generation and analysis (paper in preparation)
H.L. Eng, J. Wang, A.H. Kam, W.Y. Yau, A bayesian framework for robust human detection and occlusion handling human shape model, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 2 (2004), pp. 257–260
R. Eshel, Y. Moses, Homography based multiple camera detection and tracking of people in a dense crowd, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), pp. 1–8
L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2005), pp. 524–531
D.M. Gavrila, J. Giebel, Shape-based pedestrian detection and tracking, in IEEE Intelligent Vehicle Symposium, vol. 1 (2002), pp. 8–14
GRAZ01, http://www-old.emt.tugraz.at/~pinz/data/. Accessed 20 Dec 2018
T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene using spatio-temporal motion analysis, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 4 (2004), pp. 331–334
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in IEEE International Conference on Computer Vision (ICCV) (2017), pp. 2980–2988
L. Hou, W. Wan, K. Han, R. Muhammad, M. Yang, Human detection and tracking over camera networks: a review, in IEEE International Conference on Audio, Language and Image Processing (ICALIP) (2016), pp. 574–580
C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
W. Hu, T. Tan, L. Wang, S. Maybank, A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(3), 334–352 (2004)
X. Hu, Y. Tang, Z. Zhang, Video object matching based on SIFT algorithm, in IEEE International Conference on Neural Networks and Signal Processing (2008), pp. 412–415
K. Kale, S. Pawar, P. Dhulekar, Moving object tracking using optical flow and motion vector estimation, in IEEE 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions) (2015), pp. 1–6
M.U.G. Khan, L. Zhang, Y. Gotoh, Human focused video description, in IEEE International Conference on Computer Vision Workshops (ICCV) (2011), pp. 1480–1487
J. Klappstein, T. Vaudrey, C. Rabe, A. Wedel, R. Klette, Moving object segmentation using optical flow and depth information, in Pacific-Rim Symposium on Image and Video Technology (Springer, Berlin, 2009), pp. 611–623
H. Kuehne, H. Jhuang, R. Stiefelhagen, T. Serre, Hmdb51: a large video database for human motion recognition, in High Performance Computing in Science and Engineering (Springer, Berlin, 2013), pp. 571–582
H.E. Lai, C.Y. Lin, M.K. Chen, L.W. Kang, C.H. Yeh, Moving objects detection based on hysteresis thresholding, in Advances in Intelligent Systems and Applications, vol. 2 (Springer, Berlin, 2013), pp. 289–298
R. Li, S. Yu, X. Yang, Efficient spatio-temporal segmentation for extracting moving objects in video sequences. IEEE Trans. Consum. Electron. 53(3), 1161–1167 (2007)
H.H. Lin, T.L. Liu, J.H. Chuang, Learning a scene background model via classification. IEEE Trans. Signal Process. 57(5), 1641–1654 (2009)
T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2 (2017), p. 4
Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Y. Liu, H. Ai, G.Y. Xu, Moving object detection and tracking based on background subtraction. Int. Soc. Opt. Photonics Object Detect. Classif. Track. Technol. 4554, 62–67 (2001)
Z. Lu, L. Wang, J.R. Wen, Image classification by visual bag-of-words refinement and reduction. Neurocomputing 173, 373–384 (2016)
A. Mateus, D. Ribeiro, P. Miraldo, J.C. Nascimento, Efficient and robust pedestrian detection using deep learning for human-aware navigation. Robot. Auton. Syst. 113, 23–37 (2019)
N.A. Ogale, A survey of techniques for human detection from video. Survey Univ. Md. 125(133), 19 (2006)
M. Paul, S.M. Haque, S. Chakraborty, Human detection in surveillance videos and its applications—a review. EURASIP J. Adv. Signal Process. 2013, 176 (2013)
M. Radovic, O. Adarkwa, Q. Wang, Object recognition in aerial images using convolutional neural networks. J. Imaging 3(2), 21 (2017)
H. Ramoser, T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Shape-based detection of humans for video surveillance applications, in IEEE International Conference on Image Processing (ICIP), vol. 3 (2003)
Y. Ran, Q. Zheng, R. Chellappa, T.M. Strat, Applications of a simple characterization of human gait in surveillance. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 40(4), 1009–1020 (2010)
K.K. Reddy, M. Shah, Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 6, 1137–1149 (2017)
N. Sabri, Z. Ibrahim, M.M. Saad, N.N.A. Mangshor, N. Jamil, Human detection in video surveillance using texture features, in IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (2016), pp. 45–50
E. Şaykol, U. Güdükbay, Ö. Ulusoy, A histogram-based approach for object-based query-by-shape-and-color in image and video databases. Image Vis. Comput. 23(13), 1170–1180 (2005)
T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Performance evaluation metrics for motion detection and tracking, in Proceedings of the 17th International Conference on Pattern Recognition, (ICPR), vol. 4 (2004), pp. 519–522
H. Sidenbladh, Detecting human motion with support vector machines, in IEEE Proceedings of the 17th International Conference on Pattern Recognition (ICPR) (British Machine Vision Association, Cambridge, England, 2004), pp. 188-191
O.M. Sincan, V.B. Ajabshir, H.Y. Keles, S. Tosun, Moving object detection by a mounted moving camera, in IEEE International Conference on Computer as a Tool (EUROCON) (2015), pp. 1–6
K. Soomro, A. R. Zamir, M. Shah, UCF101: A dataset of 101 human actions classes from videos in the Wild. arXiv preprint arXiv:1212.0402 (2012)
N. Thome, S. Miguet, S. Ambellouis, A real-time, multi-view fall detection system: a LHMM-based approach. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1522–1532 (2008)
F. Van der Heijden, Image Based Measurement Systems: Object Recognition and Parameter Estimation (Wiley, Hoboken, 1994)
R.C. Veltkamp, M. Hagedoorn, State of the Art in Shape Matching. Principles of Visual Information Retrieval (Springer, London, 2001), pp. 87–119
P. Viola, M.J. Jones, D. Snow, Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis. 63(2), 153–161 (2005)
C. Zhao, K. Chen, Z. Wei, Y. Chen, D. Miao, W. Wang, Multilevel triplet deep learning model for person re-identification. Pattern Recognit. Lett. 117, 161–168 (2019)
D. Zhou, L. Wang, X. Cai, Y. Liu, Detection of moving targets with a moving camera, in IEEE International Conference on Robotics and Biomimetics (ROBIO) (2009), pp. 677–681
H. Zhou, L. Xie, X. Fang, Visual mouse: sift detection and pca recognition, in IEEE International Conference on Computational Intelligence and Security Workshops (CISW) (2007), pp. 263–266
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dilawari, A., Khan, M.U.G., ur Rehman, Z. et al. Toward Generating Human-Centered Video Annotations. Circuits Syst Signal Process 39, 857–883 (2020). https://doi.org/10.1007/s00034-019-01143-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-019-01143-9