Abstract
This paper addresses the human action recognition task from optical flow. This task is in itself an interesting problem, given the lack of accuracy and noisy characteristics of the optical flow estimation. Optical flow is one of the most popular descriptors characterizing motion, but due to its instability is usually used in combination with parametric models. In this work, we develop a non-parametric motion model using only the image region surrounding the actor making the action. To be precise, for every two consecutive frames, a local motion descriptor is calculated from the optical flow orientation histograms collected inside the actor’s bounding box. An action descriptor is built by weighting and aggregating the estimated histograms along the temporal axis. The proposed approach obtains a promising trade-off between complexity and performance compared with state-of-the-art approaches. The action recognition can also be done in real time by accumulating evidence from each new incoming image. Experiments on two well-known video sequence databases are carried out in order to evaluate the behavior of the proposal.
Similar content being viewed by others
References
Aggarwal J., Cai Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)
Ahmad, M., Lee, S.: HMM-based human action recognition using multiview image sequences. In: International Conference on Pattern Recognition, pp. 263–266 (2006)
Ahmad M., Lee S.: Human action recognition using shape and clg-motion flow from multi-view image sequences. Pattern Recognit. 41, 2237–2252 (2008)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1395–1402 (2005)
Bobick A., Davis J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008
Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 994–999 (1997)
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the 8th European Conference on Computer Vision. Lecture Notes in Computer Science, vol. 3024, pp. 25–36. Springer, New York (2004)
Bruhn A., Weickert J., Schnörr C.: Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cuntoor, N.P., Yegnanarayana, B., Chellappa, R.: Interpretation of state sequences in hmm for activity representation. In: Proceedings of IEEE ICASSP, pp. 709–712 (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, vol. 2, pp. 428–441 (2006)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of the 13th Scandinavian Conference on Image Analysis. Lecture Notes in Computer Science, vol. 2749, pp. 363–370, Göthenburg, Sweden, June–July, 2003
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Int. Conference on Computer Vision and Pattern Recognition, CVPR-08 (2008)
Gavrila D.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1), 82–98 (1999)
Ikizler, N., Duygulu, P.: Human action recognition using distribution of oriented rectangular patches. In: Workshop on Human Motion. Lecture notes in Computer Science, vol. 4814, pp. 271–284. Springer, New York (2007)
Isard, M., MacCormick, J.: Bramble: a Bayesian multiple-blob tracker. In: Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV ’01), vol. 2, pp. 34–41 (2001)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proceedings of IEEE International Conference on Computer Vision (ICCV ’05), pp. 166–173 (2005)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: International Conference on Computer Vision and Pattern Recognition (2008)
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008
Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of DARPA IU Workshop, pp. 121–130 (1981)
Lucena, M., Pérez de la Blanca, N., Fuertes, J.M., Marín-Jiménez, M.J.: Human action recognition using optical flow accumulated local histograms. In: Proceedings of the 4th IbPRIA. Lecture Notes in Computer Science, vol. 5524, pp. 32–39, June 2009, Póvoa de Varzim (Portugal). Springer, New York (2009)
Mendoza, M.A., Perez de la Blanca, N.: HMM-based action recognition using contour histograms. In: Proceedings of the 3th Iberian Conference on Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 4477, pp. 394–401. Springer, New York (2007)
Mendoza, M.A., Perez de la Blanca, N.: Human action recognition using space state models: a comparitive study. In: Proceedings of the AMDO’08. Palma de Mallorca (2008)
Mikolajczyk, K., Uemura, H.: Action recognition with motion-appearance vocabulary forest. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008
Moeslund T., Granum E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)
Moeslund T., Hilton A., Krger V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 90–126 (2006)
Mokhber A., Achard C., Maurice M.: Recognition of human behavior by space-time silhouette characterization. Pattern Recognit. Lett. 29, 81–89 (2008)
Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical report, Massachussetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory (CSAIL) (2007)
Otsu N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)
Polana, R., Nelson, R.: Detecting activities. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2–7 (1993)
Ramanan D., Forsyth D.A., Zisserman A.: Tracking people by learning their appearance. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 65–81 (2007)
Rao, C., Shah, M.: View-invariance in action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 316–322 (2001)
Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36, Cambridge, UK (2004)
Seitz S., Dyer C.: View invariant analysis of cyclic motion. Int. J. Comput. Vis. 25, 231–251 (1997)
Shechtman E., Irani M.: Space-time behavior-based correlation or how to tell if two underlying motion fields are similar without computing them?. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 2045–2056 (2007)
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1808–1815 (2005)
Venkatesh Babu R., Anantharaman B., Ramakrishnan K.R., Srinivasan S.H.: Compressed domain action classification using hmm. Pattern Recognit. Lett. 23(10), 1203–1213 (2002)
Venkatesh Babu, R., Ramakrishnan, K.R.: Compressed domain human motion recognition using motion history information. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’03), vol. 3, pp. 321–324, 6–10 April 2003
Wang, S., Quattoni, A., Morency, L.P., Demirdjian, D., Darrel, T.: Hidden conditional random fields for gesture recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’06), vol. 2, pp. 1521–1527 (2006)
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV ’08: Proceedings of the 10th European Conference on Computer Vision, pp. 650–663. Springer-Verlag, Berlin (2008)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 379–385 (1992)
Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings of Computer Vision and Pattern Recognition, vol. 2, pp. 123–130 (2001)
Zelnik-Manor L., Irani M.: Statistical analysis of dynamic actions. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1530–1535 (2006)
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
The Below is the Electronic Supplementary Material.
Rights and permissions
About this article
Cite this article
Lucena, M., Pérez de la Blanca, N. & Fuertes, J.M. Human action recognition based on aggregated local motion estimates. Machine Vision and Applications 23, 135–150 (2012). https://doi.org/10.1007/s00138-010-0305-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-010-0305-9