ABSTRACT
Apparent motion information of an action may vary dramatically from one view to another, making transfer of knowledge across views a core challenge of action recognition. Recent times have seen the use of large scale datasets to compensate for this lack in generalization, and in fact most state-of-the-art methods today require large amounts of training data and have high computational cost while training. We propose a novel technique leveraging pre-trained features refined to minimize the view-related information through adversarial training inspired by domain adaptation methods. Our method is able to recognize actions from unfamiliar viewpoints and works effectively on substantially less training data than the ones necessary to train state-of-the-art cross-view methods with exceptional results.
- Firoj Alam, Shafiq Joty, and Muhammad Imran. 2018. Domain Adaptation with Adversarial Training and Graph Embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1077--1087. Google ScholarCross Ref
- FabienBaradel, Christian Wolf, and Julien Mille. 2017. Human action recognition: Pose-based attention draws focus to hands. In Proceedings of the IEEE International Conference on Computer Vision. 604--613.Google Scholar
- Fabien Baradel, Christian Wolf, Julien Mille, and Graham W Taylor. 2018. Glimpse clouds: Human activity recognition from unstructured feature points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 469--478.Google ScholarCross Ref
- Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79, 1--2 (2010), 151--175.Google Scholar
- Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of Representations for Domain Adaptation. In Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 137--144. http://papers.nips.cc/paper/2983-analysis-of-representations-for-domain-adaptation.pdfGoogle Scholar
- Linqin Cai, Xiaolin Liu, Fuli Chen, and Min Xiang. 2018. Robust human action recognition based on depth motion maps and improved convolutional neural network. Journal of Electronic Imaging 27, 5 (2018), 051218.Google ScholarCross Ref
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, 2017. IEEE, 4724--4733.Google ScholarCross Ref
- Srikanth Cherla, Kaustubh Kulkarni, Amit Kale, and Viswanathan Ramasubramanian. 2008. Towards fast, view-invariant human action recognition. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1--8.Google ScholarCross Ref
- Gabriela Csurka. 2017. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374 (2017).Google ScholarDigital Library
- Yonghao Dang, Fuxing Yang, and Jianqin Yin. 2019. DWnet: Deep-Wide Network for 3D Action Recognition. arXiv preprint arXiv:1908.11036 (2019).Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR 2009. 248--255.Google ScholarCross Ref
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096--2030.Google ScholarDigital Library
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:stat.ML/1406.2661Google Scholar
- Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding 158 (2017), 85--105.Google ScholarDigital Library
- Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CoRR abs/1711.09577 (2017). arXiv:1711.09577 http://arxiv.org/abs/1711.09577Google Scholar
- Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2012. Recognizing actions across cameras by exploring the correlated subspace. In ECCV 2012. Springer.Google ScholarDigital Library
- Kaiqi Huang, Yeying Zhang, and Tieniu Tan. [n. d.]. A discriminative model of motion and cross ratio for view-invariant action recognition. IEEE TIP 2012 21, 4 ([n. d.]).Google Scholar
- Mariano Jaimez, Mohamed Souiai, Javier Gonzalez-Jimenez, and Daniel Cremers. 2015. A primal-dual framework for real-time dense RGB-D scene flow. In 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 98--104.Google ScholarCross Ref
- Imran Junejo, Emilie Dexter, Ivan Laptev, and Patrick Perez. 2011. View-independent action recognition from temporal self-similarities. IEEE PAMI 2011 (2011).Google ScholarDigital Library
- Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, et al. 2017. The kinetics human action video dataset. arXiv preprint (2017).Google Scholar
- Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.Google ScholarCross Ref
- Yu Kong, Zhengming Ding, Jun Li, and Yun Fu. 2017. Deeply Learned View-Invariant Features for Cross-View Action Recognition. IEEE Trans. Image Processing 2017 26, 6 (2017), 3028--3037. Google ScholarDigital Library
- Hilde Kuehne, Ali Arslan, and Thomas Serre. 2014. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In Proceedings of the IEEE conference on computer vision and pattern recognition. 780--787.Google ScholarDigital Library
- Binlong Li, Octavia I Camps, and Mario Sznaier. 2012. Cross-view activity recognition using hankelets. In IEEE CVPR 2012. IEEE.Google Scholar
- Junnan Li, Yongkang Wong, Qi Zhao, and Mohan Kankanhalli. 2018. Unsupervised learning of view-invariant action representations. In Advances in Neural Information Processing Systems. 1254--1264.Google Scholar
- Ruonan Li and Todd Zickler. 2012. Discriminative virtual views for cross-view action recognition. In IEEE CVPR 2012. IEEE.Google Scholar
- Bin Liang and Lihong Zheng. 2015. A survey on human action recognition using depth sensors. In 2015 International conference on digital image computing: techniques and applications (DICTA). IEEE, 1--8.Google ScholarCross Ref
- Duohan Liang, Guoliang Fan, Guangfeng Lin, Wanjun Chen, Xiaorong Pan, and Hong Zhu. 2019. Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarCross Ref
- Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.Google ScholarCross Ref
- Mengyuan Liu and Junsong Yuan. 2018. Recognizing human actions as the evolution of pose estimation maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1159--1168.Google ScholarCross Ref
- Elena Nicora, Gaurvi Goyal, Nicoletta Noceti, and Francesca Odone. 2019. The Effects of Data Sources: A Baseline Evaluation of the MoCA Dataset. In International Conference on Image Analysis and Processing. Springer, 544--555.Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:cs.LG/1511.06434Google Scholar
- Hossein Rahmani, Ajmal Mian, and Mubarak Shah. 2018. Learning a deep model for human action recognition from novel viewpoints. IEEE PAMI 2018 40, 3 (2018), 667--681.Google ScholarCross Ref
- Hossein Rahmani and Ajmal S. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In IEEE CVPR 2015. 2458--2466. Google ScholarCross Ref
- Grégory Rogez, José Jesús Guerrero, and Carlos Orrite. 2007. View-invariant human feature extraction for video-surveillance applications. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance. IEEE, 324--329.Google ScholarDigital Library
- Myung-Cheol Roh, Ho-Keun Shin, and Seong-Whan Lee. 2010. View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognition Letters 31, 7 (2010), 639--647.Google ScholarDigital Library
- Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12026--12035.Google ScholarCross Ref
- Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. 2018. Improving the Generalization of Adversarial Training with Domain Adaptation. arXiv:cs.LG/1810.00740Google Scholar
- Tanveer Syeda-Mahmood, A Vasilescu, and Saratendu Sethi. 2001. Recognizing action events from multiple viewpoints. In Detection and Recognition of Events in Video, 2001. IEEE.Google ScholarCross Ref
- Dongang Wang, Wanli Ouyang, Wen Li, and Dong Xu. 2018. Dividing and aggregating network for multi-view action recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 451--467.Google ScholarDigital Library
- Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In CVPR. IEEE.Google Scholar
- Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. CVPR (2014).Google Scholar
- Mei Wang and Weihong Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312 (2018), 135--153.Google ScholarDigital Library
- Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer vision and image understanding 2006 104, 2--3 (2006), 249--257.Google Scholar
- Xinxiao Wu and Yunde Jia. 2012. View-invariant action recognition using latent kernelized structural SVM. In ECCV 2012. Springer.Google ScholarDigital Library
- Guangyou Xu and Feiyue Huang. 2007. Viewpoint insensitive action recognition using envelop shape. In Asian Conference on Computer Vision. Springer, 477--486.Google Scholar
- Guangle Yao, Tao Lei, and Jiandan Zhong. 2019. A review of Convolutional-Neural-Network-based action recognition. Pattern Recognition Letters 118 (2019), 14--22.Google ScholarCross Ref
- Alper Yilmaz and Mubarak Shah. 2005. Recognizing human actions in videos acquired by uncalibrated moving cameras. In ICCV 2005. IEEE.Google ScholarDigital Library
- Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L1 optical flow. In Joint Pattern Recognition Symposium 2017. Springer, 214--223.Google ScholarCross Ref
- Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, and Hanli Wang. 2016. Real-Time Action Recognition with Enhanced Motion Vector CNNs. In IEEE CVPR 2016. 2718--2726. Google ScholarCross Ref
- Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi. 2013. Cross-view action recognition via a continuous virtual path. In IEEE CVPR 2013.Google ScholarDigital Library
- Jingjing Zheng and Zhuolin Jiang. 2013. Learning view-invariant sparse representations for cross-view action recognition. In IEEE ICCV 2013.Google ScholarDigital Library
- Jingjing Zheng, Zhuolin Jiang, P Jonathon Phillips, and Rama Chellappa. 2012. Cross-View Action Recognition via a Transferable Dictionary Pair.. In BMVC 2012.Google ScholarCross Ref
Recommendations
Learning View-Invariant Sparse Representations for Cross-View Action Recognition
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer VisionWe present an approach to jointly learn a set of view-specific dictionaries and a common dictionary for cross-view action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across ...
Cross-View Action Recognition by Projection-Based Augmentation
Image and Video TechnologyAbstractChallenging issue in cross-view action recognition is the difference between training viewpoint and testing viewpoint. Existing research deals with this problem by transferring knowledge, i.e., finding a viewpoint independent latent space in which ...
A simple multiple-fold correlation-based multi-view multi-label learning
AbstractCorrelations among different features and labels are ubiquitous in the present multi-view multi-label data sets and they are always described with within-view, cross-view, and consensus-view representations. While how to discover and measure these ...
Comments