Skip to main content

Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation

  • Conference paper
Book cover Biologically Inspired Cognitive Architectures 2012

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 196))

Abstract

Modern advances in the area of intelligent agents have led to the concept of cognitive robots. A cognitive robot is not only able to perceive complex stimuli from the environment, but also to reason about them and to act coherently. Computer vision-based recognition systems serve the perception task, but they also go beyond it by finding challenging applications in other fields such as video surveillance, HCI, content-based video analysis and motion capture. In this context, we propose an automatic system for real-time human action recognition. We use the Kinect sensor and the tracking system in [1] to robustly detect and track people in the scene. Next, we estimate the 3D optical flow related to the tracked people from point cloud data only and we summarize it by means of a 3D grid-based descriptor. Finally, temporal sequences of descriptors are classified with the Nearest Neighbor technique and the overall application is tested on a newly created dataset. Experimental results show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Munaro, M., Basso, F., Menegatti, E.: Tracking people withing groups with rgb-d data. In: Proc. of the International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal (2012)

    Google Scholar 

  2. Johansson, G.: Visual perception of biological motion and a model for its analysis. Attention, Perception, & Psychophysics 14, 201–211 (1973), 10.3758/BF03212378

    Article  Google Scholar 

  3. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E., Wheeler, R., Ng, A.: Ros: an open-source robot operating system. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA (2009)

    Google Scholar 

  4. Basso, F., Munaro, M., Michieletto, S., Pagello, E., Menegatti, E.: Fast and Robust Multi-People Tracking from RGB-D Data for a Mobile Robot. In: Lee, S., Cho, H., Yoon, K.-J., Lee, J. (eds.) Intelligent Autonomous Systems 12. AISC, vol. 193, pp. 269–281. Springer, Heidelberg (2012)

    Google Scholar 

  5. Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: IEEE Computer Society Workshop on Models versus Exemplars in Computer Vision (2001)

    Google Scholar 

  6. Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 984–989 (June 2005)

    Google Scholar 

  7. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 2, pp. 1395–1402 (2005)

    Google Scholar 

  8. Rusu, R.B., Bandouch, J., Meier, F., Essa, I.A., Beetz, M.: Human action recognition using global point feature histograms and action shapes. Advanced Robotics 23(14), 1873–1908 (2009)

    Article  Google Scholar 

  9. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (October 2003)

    Google Scholar 

  10. Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: Sixth International Conference on Computer Vision, pp. 120–127 (January 1998)

    Google Scholar 

  11. Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(2), 288–303 (2010)

    Article  Google Scholar 

  12. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 1, pp. 166–173 (2005)

    Google Scholar 

  13. Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)

    Google Scholar 

  14. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA 2007, pp. 357–360. ACM, New York (2007)

    Chapter  Google Scholar 

  15. Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int. Computer Vision Conf., pp. 432–439 (2003)

    Google Scholar 

  16. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)

    Google Scholar 

  17. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. 2nd Joint IEEE Int. Visual Surveillance and Performance Evaluation of Tracking and Surveillance Workshop, pp. 65–72 (2005)

    Google Scholar 

  18. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  19. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)

    Article  Google Scholar 

  20. Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (September 2008)

    Google Scholar 

  21. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)

    Article  Google Scholar 

  22. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14 (June 2010)

    Google Scholar 

  23. Holte, M.B., Moeslund, T.B.: View invariant gesture recognition using 3d motion primitives. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, March 31-April 4, pp. 797–800 (2008)

    Google Scholar 

  24. Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 342–349 (May 2011)

    Google Scholar 

  25. Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from rgbd images. In: Plan, Activity, and Intent Recognition. AAAI Workshops, vol. WS-11-16. AAAI (2011)

    Google Scholar 

  26. Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: International Conference on Robotics and Automation, ICRA (2012)

    Google Scholar 

  27. Yang, X., Tian, Y.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: IEEE Workshop on CVPR for Human Activity Understanding from 3D Data (2012)

    Google Scholar 

  28. Zhang, H., Parker, L.E.: 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2044–2049 (September 2011)

    Google Scholar 

  29. Ni, P.B., Wang, G., Moulin, P.: Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1147–1153 (November 2011)

    Google Scholar 

  30. Popa, M., Koc, A.K., Rothkrantz, L.J.M., Shan, C., Wiggers, P.: Kinect Sensing of Shopping Related Actions. In: Wichert, R., Van Laerhoven, K., Gelissen, J. (eds.) AmI 2011. CCIS, vol. 277, pp. 91–100. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  31. Schindler, K., van Gool, L.: Action snippets: How many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gioia Ballin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ballin, G., Munaro, M., Menegatti, E. (2013). Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation. In: Chella, A., Pirrone, R., Sorbello, R., Jóhannsdóttir, K. (eds) Biologically Inspired Cognitive Architectures 2012. Advances in Intelligent Systems and Computing, vol 196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34274-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34274-5_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34273-8

  • Online ISBN: 978-3-642-34274-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics