Abstract
In this paper, we focus on the challenging cross-view action recognition problem. The key to this problem is to find the correspondence between source and target views, which is realized in two stages in this paper. Firstly, we construct a Dual-Codebook for the two views, which is composed of two codebooks corresponding to source and target views, respectively. Each codeword in one codebook has a corresponding codeword in the other codebook, which is different from traditional methods that implement independent codebooks in the two views. We propose an effective co-clustering algorithm based on semi-nonnegative matrix factorization to derive the Dual-Codebook. With the Dual-Codebook, an action can be represented based on Bag-of-Dual-Codes (BoDC) no matter it is in the source view or in the target view. Therefore, the Dual-Codebook establishes a sort of codebook-to-codebook correspondence, which is the foundation for the second stage. In the second stage, we observe that, although the appearance of action samples will change significantly with viewpoints, the temporal relationship between atom actions within an action should be stable across views. Therefore, we further propose a hierarchical transfer framework to obtain the feature-to-feature correspondence at atom-level between source and target views. The framework is based on a temporal structure that can effectively capture the temporal relationship between atom actions within an action. It performs transfer at atom levels of multiple timescales, while most existing methods only perform video-level transfer. We carry out a series of experiments on the IXMAS dataset. The results demonstrate that our method obtained superior performance compared to state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 1–43 (2011)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. TSP 54, 4311–4322 (2006)
Cheung, G., Baker, S., Kanade, T.: Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In: CVPR (2003)
Ding, C., Li, T.: Convex and semi-nonnegative matrix factorizations. PAMI 32, 45–55 (2010)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)
Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)
Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)
Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)
Gavrila, D., Davis, L.S.: 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (1996)
Holte, M.B., Moeslund, T.B., Tran, C., Trivedi, M.: Human action recognition using multiple views: a comparative perspective on recent developments. In: HGBU (2011)
Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. TCSVT 40, 13–24 (2010)
Junejo, I., Dexter, E., Laptev, I., Patrick, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)
Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2001)
Li, R., Zickler, T.: Discriminative virtual views for cross-view action recognition. In: CVPR (2012)
Lin, Z., Jiang, Z., Davis, L.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR (2008)
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)
Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: CVPR (2007)
Paramesmaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66, 83–101 (2006)
Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. IJCV 50, 203–226 (2002)
Tropp, J., Gilbert, A.: Signal recovery from random measurements via orthogonal matching pursuit. TIT 53, 4655–4666 (2007)
Turaga, P., Chellappa, R., Subrahmanian, V., Udrea, O.: Machine recognition of human activities: a survey. TCSVT 18, 1473–1488 (2008)
Valera, M., Velastin, S.: Intelligent distributed surveillance systems: a review. VISP 152, 192–204 (2005)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D examplars. In: ICCV (2007)
Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: ECCV (2010)
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. PAMI 31, 210–227 (2009)
Yan, P., Khan, S.M., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: CVPR (2008)
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR (2005)
Zhang, Z., Wang, Y., Zhang, Z.: Face synthesis from near-infrared to visual light via sparse representation. In: IJCB (2011)
Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: ICCV (2013)
Zheng, J., Jiang, Z., Phillips, P., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: BMVC (2012)
Acknowledgement
This work is supported by National Natural Science Foundation of China (No. 61172141), Key Projects in the National Science & Technology Pillar Program during the 12th Five-Year Plan Period (No. 2012BAK16B06), and Science and Technology Program of Guangzhou, China (2014J4100092).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, C., Zheng, H., Lai, J. (2015). Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)