Skip to main content
Log in

Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper addresses the problem of activity localization and recognition in large scale video datasets by the collaborative use of holistic and motion based information (called motion cues). The concept of salient objects is used to obtain the holistic information while the motion cues are obtained by affine motion model and optical flow. The motion cues compensate the camera motion and localize the object of interest in a set of object proposals. Furthermore, the holistic information and motion cues are fused to get a reliable object of interest. In recognition phase, the holistic and motion based features are extracted from the object of interest for the training and testing of classifier. The extreme learning machine is adopted as a classifier to reduce the training and testing time and increase the classification accuracy. The effectiveness of the proposed approach is tested on UCF sports dataset. The detailed experimentation reveals that the proposed approach performs better than state-of-the-art action localization and recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition

  2. Viola, Paul, Jones, Michael: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

    Article  Google Scholar 

  3. Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2008

  4. Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)

    Article  Google Scholar 

  5. Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  6. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of IEEE International Joint Conference on Neural Networks

  7. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

  8. Everts, I., van Gemert, J., Gevers, T.: Evaluation of color stips for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)

  9. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)

  10. Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)

  11. Van Gemert, J.C., Veenman, C.J., Geusebroek, J.-M.: Episode-constrained cross-validation in video concept retrieval. IEEE Trans. Multimed. 11(4), 780–786 (2009)

    Article  Google Scholar 

  12. Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011

  13. Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: In Proceedings of IEEE CVPR, pp. 2642–2649 (2013)

  14. Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)

    Article  Google Scholar 

  15. Cao, L., Liu, Z., Huang, T.S.: Cross-dataset action detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)

  16. Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: IEEE Conference on Computer Vision, pp. 2003–2010, Nov 2011

  17. Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: In Proceedings of Neural Information Process (NIPS), pp. 350–358, Dec 2012

  18. Tran, D., Yuan, J., Forsyth, D.: Video event detection: from subvolume localization to spatio-temporal path search. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 404–416 (2013)

    Article  Google Scholar 

  19. Derpanis, K., Sizintsev, M., Cannons, K., Wildes, R.: Efficient action spotting based on a spacetime oriented structure representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1990–1997. (2010)

  20. Sapienza, M., Cuzzolin, F. Torr, P.H.: Learning discriminative spacetime actions from weakly labelled videos. In: In Proceedings of BMVC (2012)

  21. Laptev, I., Perez, P.: Retrieving actions in movies. In: In Proceedings of ICCV, pp. 1–8. (2007)

  22. Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In International Workshop on Sign, Gesture, Activity (2010)

  23. Zhao, S., Precioso, F., Cord, M.: Spatio-temporal tube data representation and kernel design for svm-based video object retrieval system. Multimed. Tools Appl. 55(1), 105–125 (2011)

    Article  Google Scholar 

  24. Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1242–1249. June, 2012

  25. Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011

  26. Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: In Proceedings of European Conference on Computer Vision, Sep, 2010

  27. Duchenne, O., Laptev, I., Sivic, J., Bach, E., Ponce, J.: Automatic annotation of human actions in video. In: IEEE International Conference on Computer Vision (2009)

  28. Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208. (2011)

  29. Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2061–2068. (2010)

  30. Willems, G., Becker, J.H., Tuytelaars, T., Van Gool, L.: Exemplarbased action recognition in video. In: In Proceedings of BMVC, pp. 1–11. (2009)

  31. Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)

    Article  Google Scholar 

  32. Liu, j., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003. (2009)

  33. Yu, G., Goussies, N., Yuan, J., Liu, Z.: Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans. Multimed. 13(3), 507–517 (2011)

    Article  Google Scholar 

  34. Wang, T., Wang, S., Ding, X.: Detecting human action as the spatio-temporal tube of maximum mutual information. IEEE Trans. Circuits Syst. Video Technol. 24(2), 277–290 (2014)

    Article  Google Scholar 

  35. Trichet, R., Nevatia, R.: Video segmentation with spatio-temporal tubes. In: In IEEE International Conference on Advanced Video and Signal Based Surveillance (2013)

  36. Jain, M., Gemert, J., Jegou, H., Bouthemy, P., Snoek C.: Action localization with tubelets from motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 740–747, June, 2014

  37. Alexe, B., Deselaers, V., Ferrari, T.: What is an object? in: IEEE Conference on Computer Vision and Pattern Recognition (2010)

  38. Endres, I., Hoiem, D.: Category independent object proposals. In: In European Conference on Computer Vision (2010)

  39. Manen, S., Guillaumin, M., Van Gool, L. : Prime object proposals with randomized prims algorithm. In: IEEE Conference on Computer Vision (2013)

  40. Rahtu, E., Kannala, J., Blaschko, M.: Learning a category independent object detection cascade. In: IEEE Conference on Computer Vision (2011)

  41. Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

  42. Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, l.: Deepproposals: hunting objects and actions by cascading deep convolutional layers. In: IEEE Conference on Computer Vision (2015)

  43. Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, E., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)

  44. Zhou, Z., Shi, F., Wu, W.: Learning spatial and temporal extents of human actions for action detection. IEEE Trans. Multimed. 17(4), 512–525 (2015)

    Article  Google Scholar 

  45. Gemert, J., Jain, M., Gati, E., Snoek, C.: Apt: action localization proposals from dense trajectories. In: In Proceedings of BMVC (2015)

  46. Sultani, W., Shah, M.: What if we do not have multiple videos of the same action? video action localization using web images. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

  47. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE Conference on Computer Vision, pp. 3551–3558, Dec, 2013

  48. Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C., Lear, I., Vista, I., Liama, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of BMVC, pp. 1–11, 2009

  49. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2046–2053. (2010)

  50. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: ECCV (2010)

  51. Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Conference on Computer Vision (2013)

  52. Yeffet, L., Wolf, l.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497. (2009)

  53. Oneata, D. Revaud, J., Verbeek, J., Schmid, C.: Spatiotemporal object detection proposals. In: European Conference on Computer Vision (2014)

  54. Gati, E., Schavemaker, J., Gemert, J.: Bing3d: fast spatio-temporal proposals for action localization. In: Netherlands Conference on Computer Vision (2015)

  55. Odobez, J., Bouthemy, P.: Robust multiresolution estimation of parametric motion models. J. Vis. Commun. Image Represent. 6(4), 348–365 (1995)

    Article  Google Scholar 

  56. Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

  57. Khan, A., Ullah, J., Jaffar, M.A., Chai, T.: Color image segmentation: a novel spatial fuzzy genetic algorithm. Signal Image Video Process. 8(7), 1233–1243 (2014)

    Article  Google Scholar 

  58. Chaudhry, R. Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

  59. Oliva, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  60. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: In Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401–408, 2007

  61. Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2(3), 321–355 (1988)

    MathSciNet  MATH  Google Scholar 

  62. Chen, S., Cowan, C., Grant, P.: Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2), 302–309 (1991)

    Article  Google Scholar 

  63. Huang, G., Huang, G.-B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    Article  MATH  Google Scholar 

  64. Minhas, R., Baradarani, A., Seifzadeh, S., Jonathan, W.Q.: Human action recognition using extreme learning machine based on visual vocabularies. Int. J. Neurocomput. 73(10–12), 1906–1917 (2010)

    Article  Google Scholar 

  65. Iosifidis,A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: European Signal Processing Conference (2013)

  66. Iosifidis, A., Tefas, A., Pitas, I.: Minimum class variance extreme learning machine for human action recognition. IEEE Trans. Circuits Syst. Video Technol. 23(1), 1968–1979 (2013)

    Article  Google Scholar 

  67. Iosifidis, A., Tefas, A., Pitas, I.: Dynamic action recognition based on dynemes and extreme learning machine. Pattern Recogn. Lett. 34(15), 1890–1898 (2013)

    Article  Google Scholar 

  68. Ma, S., Zhang, J., Ikizler-Cinbis, N., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2744–2751, Dec, 2013

Download references

Acknowledgements

This work is supported by Higher Education Commission of Pakistan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javid Ullah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullah, J., Jaffar, M.A. Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos. Cluster Comput 21, 311–322 (2018). https://doi.org/10.1007/s10586-017-0825-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0825-4

Keywords