Robust information fusion in the DOHT paradigm for real-time action detection

Vaquette, Geoffrey; Achard, Catherine; Lucat, Laurent

doi:10.1007/s11554-016-0660-5

Robust information fusion in the DOHT paradigm for real-time action detection

Original Research Paper
Published: 10 December 2016

Volume 16, pages 1511–1524, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Geoffrey Vaquette¹,
Catherine Achard² &
Laurent Lucat¹

279 Accesses
Explore all metrics

Abstract

In the increasingly explored domain of action analysis, our work focuses on action detection—i.e., segmentation and classification—in the context of real applications. Hough transform paradigm fits well for such applications. In this paper, we extend deeply optimized Hough transform paradigm to handle various feature types and to merge information provided by multiple sensors—e.g., RBG sensors, depth sensors and skeleton data. To this end, we propose and compare three fusion methods applied at different levels of the algorithm, one being robust to data losses and, thus, to sensor failure. We deeply study the influence of merged features on the algorithm’s accuracy. Finally, since we consider real-time applications such as human interactions, we investigate the latency and computation time of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusing depth and colour information for human action recognition

Article 27 November 2018

Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction

Effective human action recognition using global and local offsets of skeleton joints

Article 20 July 2018

References

Allmen, M., Dyer, C.R.: Cyclic motion detection using spatiotemporal surfaces and curves. In: Pattern Recognition, 1990. Proceedings., 10th International Conference on, vol. 1, pp. 365–370. IEEE (1990)
Allmen, M.C.: Image sequence description using spatiotemporal flow curves: toward motion-based recognition. Ph.D. thesis, Citeseer (1991)
Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)
Article Google Scholar
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Human Behavior Understanding, pp. 29–39. Springer (2011)
Barnachon, M., Bouakaz, S., Boufama, B., Guillou, E.: Ongoing human action recognition with motion capture. Pattern Recognit. 47(1), 238–247 (2014)
Article Google Scholar
Cedras, C., Shah, M.: Motion-based recognition a survey. Image Vis. Comput. 13(2), 129–155 (1995)
Article Google Scholar
Chakrabartty, S., Cauwenberghs, G.: Gini support vector machine: quadratic entropy based robust multi-class probability regression. J. Mach. Learn. Res. 8(Apr), 813–839 (2007)
MATH Google Scholar
Chan-Hon-Tong, A., Achard, C., Lucat, L.: Deeply optimized hough transform: Application to action segmentation. In: Image Analysis and Processing–ICIAP 2013, pp. 51–60. Springer (2013)
Chan-Hon-Tong, A., Achard, C., Lucat, L.: Simultaneous segmentation and classification of human actions in video streams using deeply optimized hough transform. Pattern Recognit. 47(12), 3807–3818 (2014)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pp. 1–8. IEEE (2007)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893. IEEE (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Computer Vision–ECCV 2006, pp. 428–441. Springer (2006)
Darrell, T., Pentland, A.: Recognition of space-time gestures using a distributed representation. In: Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology (1993)
Darrell, T., Pentland, A.: Space-time gestures. In: Computer Vision and Pattern Recognition, 1993. Proceedings CVPR’93., 1993 IEEE Computer Society Conference on, pp. 335–340. IEEE (1993)
Davis, J., Shah, M.: Visual gesture recognition. IEE Proc. Vis. Image Signal Process. 141(2), 101–106 (1994)
Article Google Scholar
Feldman, J.A.: Four frames suffice: a provisional model of vision and space. Behav. Brain Sci. 8(02), 265–289 (1985)
Article Google Scholar
Feng, X., Perona, P.: Human action recognition by sequence of movelet codewords. In: 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on, pp. 717–721. IEEE (2002)
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)
Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE. Trans. Pattern. Anal. Mach. Intell. 33(11), 2188–2202 (2011)
Article Google Scholar
Goddard, N.H.: The perception of articulated motion: recognizing moving light displays. Tech. rep, DTIC Document (1992)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the fourth alvey vision conference. pp. 147–151 (1988)
Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 3265–3272. IEEE (2011)
Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5344–5352 (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)
Article Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)
Article Google Scholar
Kosmopoulos, D.I., Papoutsakis, K., Argyros, A.A.: Online segmentation and classification of modeled actions performed in the context of unmodeled ones. Trans. PAMI 33(11), 2188–2202 (2011)
Article Google Scholar
Krüger, V., Kragic, D., Ude, A., Geib, C.: The meaning of action: a review on action recognition and mapping. Adv. Robot. 21(13), 1473–1501 (2007)
Google Scholar
Kuehne, H., Arslan, A., Serre, T.: The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 780–787 (2014)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Article Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8. IEEE (2008)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision (2004)
Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)
Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 514–521. IEEE (2009)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 104–111. IEEE (2009)
Müller, M.: Dynamic time warping. In: Information retrieval for music and motion, pp. 69–84. Springer, Heidelberg (2007)
Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: Improving action recognition via trajectory selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3698–3706 (2015)
Ni, B., Pei, Y., Moulin, P., Yan, S.: Multilevel depth and image fusion for human activity detection. IEEE Trans. Cybern. 43(5), 1383–1394 (2013)
Article Google Scholar
Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Computer Vision–ECCV 2006, pp. 490–503. Springer (2006)
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 716–723. IEEE (2013)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. arXiv preprint arXiv:1405.4506 (2014)
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Song, Y., Liu, S., Tang, J.: Describing trajectory of surface patch for human action recognition on rgb and depth videos. IEEE Signal Process. Lett. 22(4), 426–429 (2015)
Article Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Internationnal Conference on Computer Vision and Pattern Recognition (2009)
Tenorth, M., Bandouch, J., Beetz, M.: The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In: International Conference on Computer Vision Workshops (2009)
Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 252–259. Springer (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action Recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 3169–3176. Colorado Springs, United States (2011). http://hal.inria.fr/inria-00583818/en
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 3551–3558. IEEE (2013)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1290–1297. IEEE (2012)
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning, and recognition. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 2649–2656. IEEE (2014)
Wang, Y., Mori, G.: Learning a discriminative hidden part model for human action recognition. In: Advances in Neural Information Processing Systems, pp. 1721–1728 (2009)
Wohlhart, P., Schulter, S., Kostinger, M., Roth, P., Bischof, H.: Discriminative hough forests for object detection. In: Conference of British Machine Vision Conference (2012)
Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 2834–2841. IEEE (2013)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pp. 379–385. IEEE (1992)
Yao, A., Gall, J., Fanelli, G., Van Gool, L.: Does human action recognition benefit from pose estimation? In: Conference of British Machine Vision Conference (2011)
Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2061–2068. IEEE (2010)
Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1311 (2015)
Zhang, J., Gong, S.: Action categorization with modified hidden conditional random field. Pattern Recognit. 43(1), 197–203 (2010)
Article Google Scholar
Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: Conference of British Machine Vision Conference (2010)

Download references

Author information

Authors and Affiliations

CEA, LIST, Vision and Content Engineering Laboratory, Point Courrier 173, 91191, Gif-sur-yvette, France
Geoffrey Vaquette & Laurent Lucat
UPMC Univ Paris 06, CNRS, UMR 7222, ISIR, Sorbonne University, 75005, Paris, France
Catherine Achard

Authors

Geoffrey Vaquette
View author publications
You can also search for this author inPubMed Google Scholar
Catherine Achard
View author publications
You can also search for this author inPubMed Google Scholar
Laurent Lucat
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Geoffrey Vaquette.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaquette, G., Achard, C. & Lucat, L. Robust information fusion in the DOHT paradigm for real-time action detection. J Real-Time Image Proc 16, 1511–1524 (2019). https://doi.org/10.1007/s11554-016-0660-5

Download citation

Received: 28 May 2016
Accepted: 24 November 2016
Published: 10 December 2016
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11554-016-0660-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust information fusion in the DOHT paradigm for real-time action detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fusing depth and colour information for human action recognition

Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction

Effective human action recognition using global and local offsets of skeleton joints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now