Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos

Ullah, Javid; Jaffar, Muhammad Arfan

doi:10.1007/s10586-017-0825-4

Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos

Published: 23 March 2017

Volume 21, pages 311–322, (2018)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Javid Ullah¹ &
Muhammad Arfan Jaffar²

258 Accesses
Explore all metrics

Abstract

This paper addresses the problem of activity localization and recognition in large scale video datasets by the collaborative use of holistic and motion based information (called motion cues). The concept of salient objects is used to obtain the holistic information while the motion cues are obtained by affine motion model and optical flow. The motion cues compensate the camera motion and localize the object of interest in a set of object proposals. Furthermore, the holistic information and motion cues are fused to get a reliable object of interest. In recognition phase, the holistic and motion based features are extracted from the object of interest for the training and testing of classifier. The extreme learning machine is adopted as a classifier to reduce the training and testing time and increase the classification accuracy. The effectiveness of the proposed approach is tested on UCF sports dataset. The detailed experimentation reveals that the proposed approach performs better than state-of-the-art action localization and recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection

Article Open access 19 December 2017

Human Activity Recognition in Video Sequences Based on the Integration of Optical Flow and Appearance of Human Objects

Detection of individual activities in video sequences based on fast interference discovery and semi-supervised method

Article 19 January 2021

References

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition
Viola, Paul, Jones, Michael: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Article Google Scholar
Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2008
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Article Google Scholar
Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of IEEE International Joint Conference on Neural Networks
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Everts, I., van Gemert, J., Gevers, T.: Evaluation of color stips for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)
Van Gemert, J.C., Veenman, C.J., Geusebroek, J.-M.: Episode-constrained cross-validation in video concept retrieval. IEEE Trans. Multimed. 11(4), 780–786 (2009)
Article Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011
Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: In Proceedings of IEEE CVPR, pp. 2642–2649 (2013)
Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)
Article Google Scholar
Cao, L., Liu, Z., Huang, T.S.: Cross-dataset action detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: IEEE Conference on Computer Vision, pp. 2003–2010, Nov 2011
Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: In Proceedings of Neural Information Process (NIPS), pp. 350–358, Dec 2012
Tran, D., Yuan, J., Forsyth, D.: Video event detection: from subvolume localization to spatio-temporal path search. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 404–416 (2013)
Article Google Scholar
Derpanis, K., Sizintsev, M., Cannons, K., Wildes, R.: Efficient action spotting based on a spacetime oriented structure representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1990–1997. (2010)
Sapienza, M., Cuzzolin, F. Torr, P.H.: Learning discriminative spacetime actions from weakly labelled videos. In: In Proceedings of BMVC (2012)
Laptev, I., Perez, P.: Retrieving actions in movies. In: In Proceedings of ICCV, pp. 1–8. (2007)
Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In International Workshop on Sign, Gesture, Activity (2010)
Zhao, S., Precioso, F., Cord, M.: Spatio-temporal tube data representation and kernel design for svm-based video object retrieval system. Multimed. Tools Appl. 55(1), 105–125 (2011)
Article Google Scholar
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1242–1249. June, 2012
Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011
Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: In Proceedings of European Conference on Computer Vision, Sep, 2010
Duchenne, O., Laptev, I., Sivic, J., Bach, E., Ponce, J.: Automatic annotation of human actions in video. In: IEEE International Conference on Computer Vision (2009)
Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208. (2011)
Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2061–2068. (2010)
Willems, G., Becker, J.H., Tuytelaars, T., Van Gool, L.: Exemplarbased action recognition in video. In: In Proceedings of BMVC, pp. 1–11. (2009)
Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)
Article Google Scholar
Liu, j., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003. (2009)
Yu, G., Goussies, N., Yuan, J., Liu, Z.: Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans. Multimed. 13(3), 507–517 (2011)
Article Google Scholar
Wang, T., Wang, S., Ding, X.: Detecting human action as the spatio-temporal tube of maximum mutual information. IEEE Trans. Circuits Syst. Video Technol. 24(2), 277–290 (2014)
Article Google Scholar
Trichet, R., Nevatia, R.: Video segmentation with spatio-temporal tubes. In: In IEEE International Conference on Advanced Video and Signal Based Surveillance (2013)
Jain, M., Gemert, J., Jegou, H., Bouthemy, P., Snoek C.: Action localization with tubelets from motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 740–747, June, 2014
Alexe, B., Deselaers, V., Ferrari, T.: What is an object? in: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Endres, I., Hoiem, D.: Category independent object proposals. In: In European Conference on Computer Vision (2010)
Manen, S., Guillaumin, M., Van Gool, L. : Prime object proposals with randomized prims algorithm. In: IEEE Conference on Computer Vision (2013)
Rahtu, E., Kannala, J., Blaschko, M.: Learning a category independent object detection cascade. In: IEEE Conference on Computer Vision (2011)
Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, l.: Deepproposals: hunting objects and actions by cascading deep convolutional layers. In: IEEE Conference on Computer Vision (2015)
Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, E., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Zhou, Z., Shi, F., Wu, W.: Learning spatial and temporal extents of human actions for action detection. IEEE Trans. Multimed. 17(4), 512–525 (2015)
Article Google Scholar
Gemert, J., Jain, M., Gati, E., Snoek, C.: Apt: action localization proposals from dense trajectories. In: In Proceedings of BMVC (2015)
Sultani, W., Shah, M.: What if we do not have multiple videos of the same action? video action localization using web images. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE Conference on Computer Vision, pp. 3551–3558, Dec, 2013
Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C., Lear, I., Vista, I., Liama, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of BMVC, pp. 1–11, 2009
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2046–2053. (2010)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: ECCV (2010)
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Conference on Computer Vision (2013)
Yeffet, L., Wolf, l.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497. (2009)
Oneata, D. Revaud, J., Verbeek, J., Schmid, C.: Spatiotemporal object detection proposals. In: European Conference on Computer Vision (2014)
Gati, E., Schavemaker, J., Gemert, J.: Bing3d: fast spatio-temporal proposals for action localization. In: Netherlands Conference on Computer Vision (2015)
Odobez, J., Bouthemy, P.: Robust multiresolution estimation of parametric motion models. J. Vis. Commun. Image Represent. 6(4), 348–365 (1995)
Article Google Scholar
Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Khan, A., Ullah, J., Jaffar, M.A., Chai, T.: Color image segmentation: a novel spatial fuzzy genetic algorithm. Signal Image Video Process. 8(7), 1233–1243 (2014)
Article Google Scholar
Chaudhry, R. Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Oliva, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: In Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401–408, 2007
Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2(3), 321–355 (1988)
MathSciNet MATH Google Scholar
Chen, S., Cowan, C., Grant, P.: Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2), 302–309 (1991)
Article Google Scholar
Huang, G., Huang, G.-B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)
Article MATH Google Scholar
Minhas, R., Baradarani, A., Seifzadeh, S., Jonathan, W.Q.: Human action recognition using extreme learning machine based on visual vocabularies. Int. J. Neurocomput. 73(10–12), 1906–1917 (2010)
Article Google Scholar
Iosifidis,A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: European Signal Processing Conference (2013)
Iosifidis, A., Tefas, A., Pitas, I.: Minimum class variance extreme learning machine for human action recognition. IEEE Trans. Circuits Syst. Video Technol. 23(1), 1968–1979 (2013)
Article Google Scholar
Iosifidis, A., Tefas, A., Pitas, I.: Dynamic action recognition based on dynemes and extreme learning machine. Pattern Recogn. Lett. 34(15), 1890–1898 (2013)
Article Google Scholar
Ma, S., Zhang, J., Ikizler-Cinbis, N., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2744–2751, Dec, 2013

Download references

Acknowledgements

This work is supported by Higher Education Commission of Pakistan.

Author information

Authors and Affiliations

National University of Computer and Emerging Sciences, H-11/4, Islamabad, Pakistan
Javid Ullah
Al Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
Muhammad Arfan Jaffar

Authors

Javid Ullah
View author publications
You can also search for this author inPubMed Google Scholar
Muhammad Arfan Jaffar
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Javid Ullah.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ullah, J., Jaffar, M.A. Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos. Cluster Comput 21, 311–322 (2018). https://doi.org/10.1007/s10586-017-0825-4

Download citation

Received: 13 December 2016
Revised: 21 February 2017
Accepted: 14 March 2017
Published: 23 March 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10586-017-0825-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection

Human Activity Recognition in Video Sequences Based on the Integration of Optical Flow and Appearance of Human Objects

Detection of individual activities in video sequences based on fast interference discovery and semi-supervised method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now