Reverse Testing Image Set Model Based Multi-view Human Action Recognition

Gao, Z.; Zhang, Y.; Zhang, H.; Xu, G. P.; Xue, Y. B.

doi:10.1007/978-3-319-27671-7_33

Z. Gao^19,20,
Y. Zhang^19,20,
H. Zhang^19,20,
G. P. Xu^19,20 &
…
Y. B. Xue^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

2947 Accesses
2 Citations

Abstract

Recognizing human activities from videos becomes a hot research topic in computer vision, but many studies show that action recognition based on single view cannot obtain satisfying performance, thus, many researchers put their attentions on multi-view action recognition, but how to mine the relationships among different views still is a challenge problem. Since video face recognition algorithm based on image set has proved that image set algorithm can effectively mine the complementary properties of different views image, and achieves satisfying performance. Thus, Inspired by these, image set is utilized to mine the relationships among multi-view action recognition. However, the studies show that the sample number of gallery and query set in video face recognition based on image set will affect the algorithm performance, and several ten to several hundred samples is supplied, but, in multi-view action recognition, we only have 3–5 views (samples) in each query set, which will limit the effect of image set.

In order to solve the issues, reverse testing image set model (called RTISM) based multi-view human action recognition is proposed. We firstly extract dense trajectory feature for each camera, and then construct the shared codebook by k-means for all cameras, after that, Bag-of-Word (BoW) weight scheme is employed to code these features for each camera; Secondly, for each query set, we will compute the compound distance with each image subset in gallery set, after that, the scheme of the nearest image subset (called RTIS) is chosen to add into the query set; Finally, RTISM is optimized where the query set and RTIS are whole reconstructed by the gallery set, thus, the relationship of different actions among gallery set and the complementary property of different samples among query set are meanwhile excavated. Large scale experimental results on two public multi-view action3D datasets - Northwestern UCLA and CVS-MV-RGBD-Single, show that the reconstruction of query set over gallery set is very effectively, and RTIS added into query set is very helpful for classification, what is more, the performance of RTISM is comparable to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://vision.ucsd.edu/~leekc/HondaUCSDVideoDatabase/HondaUCSD.html.

References

Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans, Circ. Syst. Video Technol. 18(11), 1473–1488 (2008)
Article Google Scholar
Ke, S.-R., Thuc, H.L.U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., Choi, K.-H.: A review on video-based human activity recognition. Computers 2, 88–131 (2013)
Article Google Scholar
Song, Y., Davis, R.: Multi-view latent variable discriminative models for action recognition. In: CVPR 2012, pp. 1–8 (2012)
Google Scholar
Cai, Z., Wang, L., Peng, X.: Multi-view super vector for action recognition. In: CVPR 2014, pp. 1–8 (2014)
Google Scholar
Kan, M., Shan, S., Zhang, H., Lao, S., Chen, X.: Multi-view discriminant analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 808–821. Springer, Heidelberg (2012)
Chapter Google Scholar
Liu, A., Su, Y., Jia, P., Gao, Z., Hao, T., Yang, Z.: Multipe/single-view human action recognition via part-induced multi-task structural learning. IEEE Trans. Cybern. 45(6), 1194–1208 (2015)
Article Google Scholar
Gao, Z., Zhang, H., Liu, A., Xue, Y., Xu, G.: Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans. Internet Inf. Syst. 8(2), 483–503 (2014)
Article Google Scholar
Liu, A., Xu, N., Su, Y., Lin, H., Hao, T., Yang, Z.: Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing 151(2), 544–553 (2015)
Article Google Scholar
Gao, Z., Zhang, H., Xu, G.P., Xue, Y.B.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151(2), 554–564 (2015). doi:10.1016/j.neucom.2014.06.085
Google Scholar
Liu, A., Su, Y., Nie, W., Yang, Z.: Jointly learning multiple sequential dynamics for human action recognition. PLoS ONE 10(7), e0130884. doi:10.1371/journal.pone.0130884
Google Scholar
Gao, Z., Zhang, H., Xu, G-P., Xue, Y.-B., Hauptmann, A.G.: Multi-view discriminative and structure dictionary learning with group sparsity for human action recognition. Sig. Process. (2014). doi:10.1016/j.sigpro.2014.08.034
Google Scholar
Nie, W., Liu, A., Su, Y., et al.: Single/cross-camera multiple-person tracking by graph matching. Neurocomputing 139, 220–232 (2014)
Article Google Scholar
Gao, Z., Zhang, L., Chen, M., Hauptmann, A., Zhang, H., Cai, A.: Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools Appl. 68(3), 641–657 (2014)
Article Google Scholar
Liu, A., Wang, Z., Nie, W., Su, Y.: Graph-based characteristic view set extraction and matching for 3D model retrieval. Inf. Sci. (2015). doi:10.1016/j.ins.2015.04.042
Google Scholar
Gao, Z., Song, J., Zhang, H., Liu, A., Xu, G., Xue, Y.: Human action recognition via multi-modality information. J. Electr. Eng. Technol. 9(2), 739–748 (2014)
Article Google Scholar
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 121–128. IEEE (2011)
Google Scholar
Cui, Z., Shan, S., Zhang, H., Lao, S., Chen, X.: Image sets alignment for video-based face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633. IEEE (2012)
Google Scholar
Chen, Y.-C., Patel, V.M., Phillips, P.J., Chellappa, R.: Dictionary-based face recognition from video. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 766–779. Springer, Heidelberg (2012)
Chapter Google Scholar
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Google Scholar
Gunawardana, A., Byrne, W.: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)
MATH MathSciNet Google Scholar
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.-C.: Cross-view action modeling, learning, and recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR, IEEE, June 2011 (2, 6, 7, 8)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 61572357, No. 61502337, No. 61472275, No. 61201234, No. 61202168), Tianjin Municipal Natural Science Foundation (No. 14JCZDJC31700, No. 13JCQNJC0040), Tianjin Education Committee science and technology development Foundation (No. 20120802).

Author information

Authors and Affiliations

Key Laboratory of Computer Vision and System, Tianjin University of Technology, Ministry of Education, Tianjin, 300384, China
Z. Gao, Y. Zhang, H. Zhang, G. P. Xu & Y. B. Xue
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, 300384, China
Z. Gao, Y. Zhang, H. Zhang, G. P. Xu & Y. B. Xue

Authors

Z. Gao
View author publications
You can also search for this author in PubMed Google Scholar
Y. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
H. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
G. P. Xu
View author publications
You can also search for this author in PubMed Google Scholar
Y. B. Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Z. Gao .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Zhang, Y., Zhang, H., Xu, G.P., Xue, Y.B. (2016). Reverse Testing Image Set Model Based Multi-view Human Action Recognition. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_33
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics