Abstract
This paper presents a novel visual representation, called orderlets, for real-time human action recognition with depth sensors. An orderlet is a middle level feature that captures the ordinal pattern among a group of low level features. For skeletons, an orderlet captures specific spatial relationship among a group of joints. For a depth map, an orderlet characterizes a comparative relationship of the shape information among a group of subregions. The orderlet representation has two nice properties. First, it is insensitive to small noise since an orderlet only depends on the comparative relationship among individual features. Second, it is a frame-level representation thus suitable for real-time online action recognition. Experimental results demonstrate its superior performance on online action recognition and cross-environment action recognition.
G. Yu—The work was done when Gang Yu was an intern at Microsoft Research. This work is supported in part by Singapore MoE Tier-1 grant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The dataset can be downloaded from http://research.microsoft.com/en-us/um/people/zliu/ActionRecoRsrc/default.htm.
References
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC (2008)
Yagnik, J., Strelow, D., Ross, D., Lin, R.S.: The power of comparative reasoning. In: ICCV (2011)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)
Schapire, R.: A brief introduction to boosting. In: IJCAI (1999)
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 525–538. Springer, Heidelberg (2013)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: ICPR (2004)
Yu, G., Yuan, J., Liu, Z.: Unsupervised random forest indexing for fast action search. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Yu, G., Yuan, J., Liu, Z.: Propagative hough voting for human activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 693–706. Springer, Heidelberg (2012)
Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4D Normals for activity recognition from depth sequences. In: CVPR (2013)
Laptev, I.: On space-time interest points. IJCV 64(2–3), 107–123 (2005)
Dollar, P., Rabaud, V., Cottrell, G., Belongiel, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Xia, L., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR (2013)
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)
Yang, X., Tian, Y.: EigenJoints-based action recognition using Naive-Bayes-Nearest-Neighbor. In: CVPRW (2012)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM Multimedia (2012)
Chen, H.S., Chen, H.T., Chen, Y.W., Lee, S.Y.: Human action recognition using star skeleton. In: ACM International Workshop on Video Surveillance and Sensor Networks (2006)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: CVPRW (2010)
Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)
Zhu, Y., Chen, W., Guo, G.D.: Fusing spatiotemporal features and joints for 3D action recognition. In: CVPRW (2013)
Hoai, M., DelaTorre, F.: Max-margin early event detectors. In: CVPR (2012)
Zhou, B., Wang, X., Tang, X.: Understanding collective crowd behaviors: learning a mixture model of dynamic pedestrian-agents. In: CVPR (2012)
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: ICCV (2013)
Yu, G., Norberto, A., Yuan, J., Liu, Z.: Fast action detection via discriminative random forest voting and top-K subvolume search. IEEE Trans. Multimedia 13(3), 507–517 (2011)
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)
Chen, C.Y., Grauman, K.: Efficient activity detection with max-subgraph search. In: CVPR (2012)
Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)
Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: CVPR (2013)
Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, G., Liu, Z., Yuan, J. (2015). Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)