Toward Generating Human-Centered Video Annotations

Dilawari, Aniqa; Khan, M. Usman Ghani; ur Rehman, Zahoor; Awan, Khalid Mahmood; Mehmood, Irfan; Rho, Seungmin

doi:10.1007/s00034-019-01143-9

Toward Generating Human-Centered Video Annotations

Published: 24 May 2019

Volume 39, pages 857–883, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Aniqa Dilawari^1,2,
M. Usman Ghani Khan^1,2,
Zahoor ur Rehman ORCID: orcid.org/0000-0001-9968-0330³,
Khalid Mahmood Awan³,
Irfan Mehmood⁴ &
…
Seungmin Rho⁴

386 Accesses
3 Citations
Explore all metrics

Abstract

In the past few decades, research has been carried out to automatically find humans in a video sequence. Automatically detecting humans in videos is gaining interest for numerous applications such as driver assistance system, security, people counting, human gait characterization, video annotations, retrieval, or crowd flow analysis. Manual annotation of a video is a time-consuming task that involves human annotators which varying biases. In this paper, we have presented three computer vision algorithms (contour-based, HOG-based and SURF-based) and proposed a deep learning technique that automatically extracts spatiotemporal annotations of human and represents it by a bounding box. We have performed experiments and the accuracy obtained for each method is 86%, 92.5%, 94%, and 95.5%, respectively. Results show that not only annotation accuracy has increased but the human effort has reduced with respect to manual annotations. We have also introduced a new dataset ASSVS_KICS which is captured through a high-quality stationary camera and contain scenarios based on our community for video surveillance research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 12

Fig. 15

Fig. 16

Video Region Annotation with Sparse Bounding Boxes

Article 14 December 2022

Human segmentation in surveillance video with deep learning

Article Open access 06 September 2020

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

Article 13 December 2019

References

M. Akhlaq, T.R. Sheltami, B. Helgeson, E.M. Shakshuki, Designing an integrated driver assistance system using image sensors. J. Intell. Manuf. 23(6), 2109–2132 (2012)
Article Google Scholar
A. Alzughaibi, Z. Chaczko, Human detection model using feature extraction method in video frames, in IEEE International Conference on Image and Vision Computing New Zealand (IVCNZ) (2016), pp. 1–6
H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust features, in Springer European Conference on Computer Vision (Berlin, Heidelberg, 2006), pp. 404–417
Chapter Google Scholar
R. Benenson, M. Omran, J. Hosang, B. Schiele, Ten years of pedestrian detection, what have we learned?, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 613–627
Chapter Google Scholar
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in IEEE International Conference on Computer Vision (ICCV’05) (2005), pp. 1395–1402
L. Cao, M. Dikmen, Y. Fu, T.S. Huang, Gender recognition from body, in Proceedings of the 16th ACM International Conference on Multimedia, ACM (2008), pp. 725–728
D.Y. Chen, C.W. Su, Y.C. Zeng, S.W. Sun, W.R. Lai, H.Y.M Liao, An online people counting system for electronic advertising machines, in IEEE International Conference on Multimedia and Expo (ICME) (2009), pp. 1262–1265
D.Y. Chen, P.C. Hsieh, Face-based gender recognition using compressive sensing, in International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), IEEE (2012), pp. 157–161
D. Chowdhry, R. Paranjape, P. Laforge, Smart home automation system for intrusion detection, in IEEE 14th Canadian Workshop on Information Theory (CWIT) (2015), pp. 75–78
R. Cutler, L.S. Davis, Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 781–796 (2000)
Article Google Scholar
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1 (2005), pp. 886–893
N. Dalal, B. Triggs, C. Schmid, Human detection using oriented histograms of flow and appearance, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 428–441
Chapter Google Scholar
Y. Dedeoğlu, B.U. Töreyin, U. Güdükbay, A.E. Çetin, Silhouette-based method for object classification and human action recognition in video, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 64–77
Chapter Google Scholar
A. Dilawari, M.U.G. Khan, Natural language description of videos: corpus generation and analysis (paper in preparation)
H.L. Eng, J. Wang, A.H. Kam, W.Y. Yau, A bayesian framework for robust human detection and occlusion handling human shape model, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 2 (2004), pp. 257–260
R. Eshel, Y. Moses, Homography based multiple camera detection and tracking of people in a dense crowd, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), pp. 1–8
L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2005), pp. 524–531
D.M. Gavrila, J. Giebel, Shape-based pedestrian detection and tracking, in IEEE Intelligent Vehicle Symposium, vol. 1 (2002), pp. 8–14
GRAZ01, http://www-old.emt.tugraz.at/~pinz/data/. Accessed 20 Dec 2018
T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene using spatio-temporal motion analysis, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR), vol. 4 (2004), pp. 331–334
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in IEEE International Conference on Computer Vision (ICCV) (2017), pp. 2980–2988
L. Hou, W. Wan, K. Han, R. Muhammad, M. Yang, Human detection and tracking over camera networks: a review, in IEEE International Conference on Audio, Language and Image Processing (ICALIP) (2016), pp. 574–580
C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Article Google Scholar
W. Hu, T. Tan, L. Wang, S. Maybank, A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34(3), 334–352 (2004)
Article Google Scholar
X. Hu, Y. Tang, Z. Zhang, Video object matching based on SIFT algorithm, in IEEE International Conference on Neural Networks and Signal Processing (2008), pp. 412–415
K. Kale, S. Pawar, P. Dhulekar, Moving object tracking using optical flow and motion vector estimation, in IEEE 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions) (2015), pp. 1–6
M.U.G. Khan, L. Zhang, Y. Gotoh, Human focused video description, in IEEE International Conference on Computer Vision Workshops (ICCV) (2011), pp. 1480–1487
J. Klappstein, T. Vaudrey, C. Rabe, A. Wedel, R. Klette, Moving object segmentation using optical flow and depth information, in Pacific-Rim Symposium on Image and Video Technology (Springer, Berlin, 2009), pp. 611–623
Chapter Google Scholar
H. Kuehne, H. Jhuang, R. Stiefelhagen, T. Serre, Hmdb51: a large video database for human motion recognition, in High Performance Computing in Science and Engineering (Springer, Berlin, 2013), pp. 571–582
Google Scholar
H.E. Lai, C.Y. Lin, M.K. Chen, L.W. Kang, C.H. Yeh, Moving objects detection based on hysteresis thresholding, in Advances in Intelligent Systems and Applications, vol. 2 (Springer, Berlin, 2013), pp. 289–298
Chapter Google Scholar
R. Li, S. Yu, X. Yang, Efficient spatio-temporal segmentation for extracting moving objects in video sequences. IEEE Trans. Consum. Electron. 53(3), 1161–1167 (2007)
Article Google Scholar
H.H. Lin, T.L. Liu, J.H. Chuang, Learning a scene background model via classification. IEEE Trans. Signal Process. 57(5), 1641–1654 (2009)
Article MathSciNet Google Scholar
T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2 (2017), p. 4
Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Article Google Scholar
Y. Liu, H. Ai, G.Y. Xu, Moving object detection and tracking based on background subtraction. Int. Soc. Opt. Photonics Object Detect. Classif. Track. Technol. 4554, 62–67 (2001)
Google Scholar
Z. Lu, L. Wang, J.R. Wen, Image classification by visual bag-of-words refinement and reduction. Neurocomputing 173, 373–384 (2016)
Article Google Scholar
A. Mateus, D. Ribeiro, P. Miraldo, J.C. Nascimento, Efficient and robust pedestrian detection using deep learning for human-aware navigation. Robot. Auton. Syst. 113, 23–37 (2019)
Article Google Scholar
N.A. Ogale, A survey of techniques for human detection from video. Survey Univ. Md. 125(133), 19 (2006)
Google Scholar
M. Paul, S.M. Haque, S. Chakraborty, Human detection in surveillance videos and its applications—a review. EURASIP J. Adv. Signal Process. 2013, 176 (2013)
Article Google Scholar
M. Radovic, O. Adarkwa, Q. Wang, Object recognition in aerial images using convolutional neural networks. J. Imaging 3(2), 21 (2017)
Article Google Scholar
H. Ramoser, T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Shape-based detection of humans for video surveillance applications, in IEEE International Conference on Image Processing (ICIP), vol. 3 (2003)
Y. Ran, Q. Zheng, R. Chellappa, T.M. Strat, Applications of a simple characterization of human gait in surveillance. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 40(4), 1009–1020 (2010)
Article Google Scholar
K.K. Reddy, M. Shah, Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 6, 1137–1149 (2017)
Article Google Scholar
N. Sabri, Z. Ibrahim, M.M. Saad, N.N.A. Mangshor, N. Jamil, Human detection in video surveillance using texture features, in IEEE International Conference on Control System, Computing and Engineering (ICCSCE) (2016), pp. 45–50
E. Şaykol, U. Güdükbay, Ö. Ulusoy, A histogram-based approach for object-based query-by-shape-and-color in image and video databases. Image Vis. Comput. 23(13), 1170–1180 (2005)
Article Google Scholar
T. Schlogl, C. Beleznai, M. Winter, H. Bischof, Performance evaluation metrics for motion detection and tracking, in Proceedings of the 17th International Conference on Pattern Recognition, (ICPR), vol. 4 (2004), pp. 519–522
H. Sidenbladh, Detecting human motion with support vector machines, in IEEE Proceedings of the 17th International Conference on Pattern Recognition (ICPR) (British Machine Vision Association, Cambridge, England, 2004), pp. 188-191
O.M. Sincan, V.B. Ajabshir, H.Y. Keles, S. Tosun, Moving object detection by a mounted moving camera, in IEEE International Conference on Computer as a Tool (EUROCON) (2015), pp. 1–6
K. Soomro, A. R. Zamir, M. Shah, UCF101: A dataset of 101 human actions classes from videos in the Wild. arXiv preprint arXiv:1212.0402 (2012)
N. Thome, S. Miguet, S. Ambellouis, A real-time, multi-view fall detection system: a LHMM-based approach. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1522–1532 (2008)
Article Google Scholar
F. Van der Heijden, Image Based Measurement Systems: Object Recognition and Parameter Estimation (Wiley, Hoboken, 1994)
Google Scholar
R.C. Veltkamp, M. Hagedoorn, State of the Art in Shape Matching. Principles of Visual Information Retrieval (Springer, London, 2001), pp. 87–119
Book Google Scholar
P. Viola, M.J. Jones, D. Snow, Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis. 63(2), 153–161 (2005)
Article Google Scholar
C. Zhao, K. Chen, Z. Wei, Y. Chen, D. Miao, W. Wang, Multilevel triplet deep learning model for person re-identification. Pattern Recognit. Lett. 117, 161–168 (2019)
Article Google Scholar
D. Zhou, L. Wang, X. Cai, Y. Liu, Detection of moving targets with a moving camera, in IEEE International Conference on Robotics and Biomimetics (ROBIO) (2009), pp. 677–681
H. Zhou, L. Xie, X. Fang, Visual mouse: sift detection and pca recognition, in IEEE International Conference on Computational Intelligence and Security Workshops (CISW) (2007), pp. 263–266

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan
Aniqa Dilawari & M. Usman Ghani Khan
Al-Khawarizmi Institute of Computer Science, UET, Lahore, Pakistan
Aniqa Dilawari & M. Usman Ghani Khan
Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan
Zahoor ur Rehman & Khalid Mahmood Awan
Department of Software, Sejong University, Seoul, Korea
Irfan Mehmood & Seungmin Rho

Authors

Aniqa Dilawari
View author publications
You can also search for this author in PubMed Google Scholar
M. Usman Ghani Khan
View author publications
You can also search for this author in PubMed Google Scholar
Zahoor ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Mahmood Awan
View author publications
You can also search for this author in PubMed Google Scholar
Irfan Mehmood
View author publications
You can also search for this author in PubMed Google Scholar
Seungmin Rho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zahoor ur Rehman, Irfan Mehmood or Seungmin Rho.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dilawari, A., Khan, M.U.G., ur Rehman, Z. et al. Toward Generating Human-Centered Video Annotations. Circuits Syst Signal Process 39, 857–883 (2020). https://doi.org/10.1007/s00034-019-01143-9

Download citation

Received: 12 March 2019
Revised: 09 May 2019
Accepted: 11 May 2019
Published: 24 May 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00034-019-01143-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward Generating Human-Centered Video Annotations

Abstract

Access this article

Similar content being viewed by others

Video Region Annotation with Sparse Bounding Boxes

Human segmentation in surveillance video with deep learning

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward Generating Human-Centered Video Annotations

Abstract

Access this article

Similar content being viewed by others

Video Region Annotation with Sparse Bounding Boxes

Human segmentation in surveillance video with deep learning

A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation