Skip to main content
Log in

Human action recognition based on aggregated local motion estimates

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper addresses the human action recognition task from optical flow. This task is in itself an interesting problem, given the lack of accuracy and noisy characteristics of the optical flow estimation. Optical flow is one of the most popular descriptors characterizing motion, but due to its instability is usually used in combination with parametric models. In this work, we develop a non-parametric motion model using only the image region surrounding the actor making the action. To be precise, for every two consecutive frames, a local motion descriptor is calculated from the optical flow orientation histograms collected inside the actor’s bounding box. An action descriptor is built by weighting and aggregating the estimated histograms along the temporal axis. The proposed approach obtains a promising trade-off between complexity and performance compared with state-of-the-art approaches. The action recognition can also be done in real time by accumulating evidence from each new incoming image. Experiments on two well-known video sequence databases are carried out in order to evaluate the behavior of the proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal J., Cai Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)

    Article  Google Scholar 

  2. Ahmad, M., Lee, S.: HMM-based human action recognition using multiview image sequences. In: International Conference on Pattern Recognition, pp. 263–266 (2006)

  3. Ahmad M., Lee S.: Human action recognition using shape and clg-motion flow from multi-view image sequences. Pattern Recognit. 41, 2237–2252 (2008)

    Article  MATH  Google Scholar 

  4. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1395–1402 (2005)

  5. Bobick A., Davis J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  6. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008

  7. Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 994–999 (1997)

  8. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the 8th European Conference on Computer Vision. Lecture Notes in Computer Science, vol. 3024, pp. 25–36. Springer, New York (2004)

  9. Bruhn A., Weickert J., Schnörr C.: Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)

    Article  Google Scholar 

  10. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  11. Cuntoor, N.P., Yegnanarayana, B., Chellappa, R.: Interpretation of state sequences in hmm for activity representation. In: Proceedings of IEEE ICASSP, pp. 709–712 (2005)

  12. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, vol. 2, pp. 428–441 (2006)

  13. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

  14. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)

  15. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of the 13th Scandinavian Conference on Image Analysis. Lecture Notes in Computer Science, vol. 2749, pp. 363–370, Göthenburg, Sweden, June–July, 2003

  16. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Int. Conference on Computer Vision and Pattern Recognition, CVPR-08 (2008)

  17. Gavrila D.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1), 82–98 (1999)

    Article  MATH  Google Scholar 

  18. Ikizler, N., Duygulu, P.: Human action recognition using distribution of oriented rectangular patches. In: Workshop on Human Motion. Lecture notes in Computer Science, vol. 4814, pp. 271–284. Springer, New York (2007)

  19. Isard, M., MacCormick, J.: Bramble: a Bayesian multiple-blob tracker. In: Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV ’01), vol. 2, pp. 34–41 (2001)

  20. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proceedings of IEEE International Conference on Computer Vision (ICCV ’05), pp. 166–173 (2005)

  21. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: International Conference on Computer Vision and Pattern Recognition (2008)

  22. Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008

  23. Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008

  24. Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of DARPA IU Workshop, pp. 121–130 (1981)

  25. Lucena, M., Pérez de la Blanca, N., Fuertes, J.M., Marín-Jiménez, M.J.: Human action recognition using optical flow accumulated local histograms. In: Proceedings of the 4th IbPRIA. Lecture Notes in Computer Science, vol. 5524, pp. 32–39, June 2009, Póvoa de Varzim (Portugal). Springer, New York (2009)

  26. Mendoza, M.A., Perez de la Blanca, N.: HMM-based action recognition using contour histograms. In: Proceedings of the 3th Iberian Conference on Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 4477, pp. 394–401. Springer, New York (2007)

  27. Mendoza, M.A., Perez de la Blanca, N.: Human action recognition using space state models: a comparitive study. In: Proceedings of the AMDO’08. Palma de Mallorca (2008)

  28. Mikolajczyk, K., Uemura, H.: Action recognition with motion-appearance vocabulary forest. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008

  29. Moeslund T., Granum E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)

    Article  MATH  Google Scholar 

  30. Moeslund T., Hilton A., Krger V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 90–126 (2006)

    Article  Google Scholar 

  31. Mokhber A., Achard C., Maurice M.: Recognition of human behavior by space-time silhouette characterization. Pattern Recognit. Lett. 29, 81–89 (2008)

    Article  Google Scholar 

  32. Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical report, Massachussetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory (CSAIL) (2007)

  33. Otsu N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)

    Article  Google Scholar 

  34. Polana, R., Nelson, R.: Detecting activities. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2–7 (1993)

  35. Ramanan D., Forsyth D.A., Zisserman A.: Tracking people by learning their appearance. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 65–81 (2007)

    Article  Google Scholar 

  36. Rao, C., Shah, M.: View-invariance in action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 316–322 (2001)

  37. Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008

  38. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36, Cambridge, UK (2004)

  39. Seitz S., Dyer C.: View invariant analysis of cyclic motion. Int. J. Comput. Vis. 25, 231–251 (1997)

    Article  Google Scholar 

  40. Shechtman E., Irani M.: Space-time behavior-based correlation or how to tell if two underlying motion fields are similar without computing them?. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 2045–2056 (2007)

    Article  Google Scholar 

  41. Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1808–1815 (2005)

  42. Venkatesh Babu R., Anantharaman B., Ramakrishnan K.R., Srinivasan S.H.: Compressed domain action classification using hmm. Pattern Recognit. Lett. 23(10), 1203–1213 (2002)

    Article  MATH  Google Scholar 

  43. Venkatesh Babu, R., Ramakrishnan, K.R.: Compressed domain human motion recognition using motion history information. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’03), vol. 3, pp. 321–324, 6–10 April 2003

  44. Wang, S., Quattoni, A., Morency, L.P., Demirdjian, D., Darrel, T.: Hidden conditional random fields for gesture recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’06), vol. 2, pp. 1521–1527 (2006)

  45. Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV ’08: Proceedings of the 10th European Conference on Computer Vision, pp. 650–663. Springer-Verlag, Berlin (2008)

  46. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 379–385 (1992)

  47. Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings of Computer Vision and Pattern Recognition, vol. 2, pp. 123–130 (2001)

  48. Zelnik-Manor L., Irani M.: Statistical analysis of dynamic actions. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1530–1535 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Lucena.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

ESM 1 (PDF 21 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lucena, M., Pérez de la Blanca, N. & Fuertes, J.M. Human action recognition based on aggregated local motion estimates. Machine Vision and Applications 23, 135–150 (2012). https://doi.org/10.1007/s00138-010-0305-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-010-0305-9

Keywords

Navigation