Skip to main content
Log in

Robust information fusion in the DOHT paradigm for real-time action detection

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

In the increasingly explored domain of action analysis, our work focuses on action detection—i.e., segmentation and classification—in the context of real applications. Hough transform paradigm fits well for such applications. In this paper, we extend deeply optimized Hough transform paradigm to handle various feature types and to merge information provided by multiple sensors—e.g., RBG sensors, depth sensors and skeleton data. To this end, we propose and compare three fusion methods applied at different levels of the algorithm, one being robust to data losses and, thus, to sensor failure. We deeply study the influence of merged features on the algorithm’s accuracy. Finally, since we consider real-time applications such as human interactions, we investigate the latency and computation time of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Allmen, M., Dyer, C.R.: Cyclic motion detection using spatiotemporal surfaces and curves. In: Pattern Recognition, 1990. Proceedings., 10th International Conference on, vol. 1, pp. 365–370. IEEE (1990)

  2. Allmen, M.C.: Image sequence description using spatiotemporal flow curves: toward motion-based recognition. Ph.D. thesis, Citeseer (1991)

  3. Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)

    Article  Google Scholar 

  4. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Human Behavior Understanding, pp. 29–39. Springer (2011)

  5. Barnachon, M., Bouakaz, S., Boufama, B., Guillou, E.: Ongoing human action recognition with motion capture. Pattern Recognit. 47(1), 238–247 (2014)

    Article  Google Scholar 

  6. Cedras, C., Shah, M.: Motion-based recognition a survey. Image Vis. Comput. 13(2), 129–155 (1995)

    Article  Google Scholar 

  7. Chakrabartty, S., Cauwenberghs, G.: Gini support vector machine: quadratic entropy based robust multi-class probability regression. J. Mach. Learn. Res. 8(Apr), 813–839 (2007)

    MATH  Google Scholar 

  8. Chan-Hon-Tong, A., Achard, C., Lucat, L.: Deeply optimized hough transform: Application to action segmentation. In: Image Analysis and Processing–ICIAP 2013, pp. 51–60. Springer (2013)

  9. Chan-Hon-Tong, A., Achard, C., Lucat, L.: Simultaneous segmentation and classification of human actions in video streams using deeply optimized hough transform. Pattern Recognit. 47(12), 3807–3818 (2014)

    Article  Google Scholar 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pp. 1–8. IEEE (2007)

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893. IEEE (2005)

  13. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Computer Vision–ECCV 2006, pp. 428–441. Springer (2006)

  14. Darrell, T., Pentland, A.: Recognition of space-time gestures using a distributed representation. In: Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology (1993)

  15. Darrell, T., Pentland, A.: Space-time gestures. In: Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, pp. 335–340. IEEE (1993)

  16. Davis, J., Shah, M.: Visual gesture recognition. IEE Proc. Vis. Image Signal Process. 141(2), 101–106 (1994)

    Article  Google Scholar 

  17. Feldman, J.A.: Four frames suffice: a provisional model of vision and space. Behav. Brain Sci. 8(02), 265–289 (1985)

    Article  Google Scholar 

  18. Feng, X., Perona, P.: Human action recognition by sequence of movelet codewords. In: 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on, pp. 717–721. IEEE (2002)

  19. Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)

  20. Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE. Trans. Pattern. Anal. Mach. Intell. 33(11), 2188–2202 (2011)

    Article  Google Scholar 

  21. Goddard, N.H.: The perception of articulated motion: recognizing moving light displays. Tech. rep, DTIC Document (1992)

  22. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the fourth alvey vision conference. pp. 147–151 (1988)

  23. Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 3265–3272. IEEE (2011)

  24. Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5344–5352 (2015)

  25. Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  26. Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)

    Article  Google Scholar 

  27. Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)

    Article  Google Scholar 

  28. Kosmopoulos, D.I., Papoutsakis, K., Argyros, A.A.: Online segmentation and classification of modeled actions performed in the context of unmodeled ones. Trans. PAMI 33(11), 2188–2202 (2011)

    Article  Google Scholar 

  29. Krüger, V., Kragic, D., Ude, A., Geib, C.: The meaning of action: a review on action recognition and mapping. Adv. Robot. 21(13), 1473–1501 (2007)

    Google Scholar 

  30. Kuehne, H., Arslan, A., Serre, T.: The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 780–787 (2014)

  31. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  32. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8. IEEE (2008)

  33. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  34. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision (2004)

  35. Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)

  36. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 514–521. IEEE (2009)

  37. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 104–111. IEEE (2009)

  38. Müller, M.: Dynamic time warping. In: Information retrieval for music and motion, pp. 69–84. Springer, Heidelberg (2007)

  39. Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: Improving action recognition via trajectory selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3698–3706 (2015)

  40. Ni, B., Pei, Y., Moulin, P., Yan, S.: Multilevel depth and image fusion for human activity detection. IEEE Trans. Cybern. 43(5), 1383–1394 (2013)

    Article  Google Scholar 

  41. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Computer Vision–ECCV 2006, pp. 490–503. Springer (2006)

  42. Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 716–723. IEEE (2013)

  43. Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. arXiv preprint arXiv:1405.4506 (2014)

  44. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  45. Song, Y., Liu, S., Tang, J.: Describing trajectory of surface patch for human action recognition on rgb and depth videos. IEEE Signal Process. Lett. 22(4), 426–429 (2015)

    Article  Google Scholar 

  46. Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)

  47. Tenorth, M., Bandouch, J., Beetz, M.: The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In: International Conference on Computer Vision Workshops (2009)

  48. Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 252–259. Springer (2012)

  49. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action Recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 3169–3176. Colorado Springs, United States (2011). http://hal.inria.fr/inria-00583818/en

  50. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 3551–3558. IEEE (2013)

  51. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1290–1297. IEEE (2012)

  52. Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning, and recognition. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 2649–2656. IEEE (2014)

  53. Wang, Y., Mori, G.: Learning a discriminative hidden part model for human action recognition. In: Advances in Neural Information Processing Systems, pp. 1721–1728 (2009)

  54. Wohlhart, P., Schulter, S., Kostinger, M., Roth, P., Bischof, H.: Discriminative hough forests for object detection. In: Conference of British Machine Vision Conference (2012)

  55. Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 2834–2841. IEEE (2013)

  56. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pp. 379–385. IEEE (1992)

  57. Yao, A., Gall, J., Fanelli, G., Van Gool, L.: Does human action recognition benefit from pose estimation? In: Conference of British Machine Vision Conference (2011)

  58. Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2061–2068. IEEE (2010)

  59. Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1311 (2015)

  60. Zhang, J., Gong, S.: Action categorization with modified hidden conditional random field. Pattern Recognit. 43(1), 197–203 (2010)

    Article  Google Scholar 

  61. Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: Conference of British Machine Vision Conference (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey Vaquette.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vaquette, G., Achard, C. & Lucat, L. Robust information fusion in the DOHT paradigm for real-time action detection. J Real-Time Image Proc 16, 1511–1524 (2019). https://doi.org/10.1007/s11554-016-0660-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-016-0660-5

Keywords