research-article

Adversarial feature refinement for cross-view action recognition

Authors:
Antonio Marsella

Maastricht University, The Netherlands

Maastricht University, The Netherlands
View Profile

,
Gaurvi Goyal

University of Genova, Italy

University of Genova, Italy
View Profile

,
Francesca Odone

University of Genova, Italy

University of Genova, Italy
View Profile

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied ComputingMarch 2021Pages 1046–1054https://doi.org/10.1145/3412841.3441981

Published:22 April 2021Publication History

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 1046–1054

ABSTRACT

Apparent motion information of an action may vary dramatically from one view to another, making transfer of knowledge across views a core challenge of action recognition. Recent times have seen the use of large scale datasets to compensate for this lack in generalization, and in fact most state-of-the-art methods today require large amounts of training data and have high computational cost while training. We propose a novel technique leveraging pre-trained features refined to minimize the view-related information through adversarial training inspired by domain adaptation methods. Our method is able to recognize actions from unfamiliar viewpoints and works effectively on substantially less training data than the ones necessary to train state-of-the-art cross-view methods with exceptional results.

References

Firoj Alam, Shafiq Joty, and Muhammad Imran. 2018. Domain Adaptation with Adversarial Training and Graph Embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1077--1087. Google ScholarCross Ref
FabienBaradel, Christian Wolf, and Julien Mille. 2017. Human action recognition: Pose-based attention draws focus to hands. In Proceedings of the IEEE International Conference on Computer Vision. 604--613.Google Scholar
Fabien Baradel, Christian Wolf, Julien Mille, and Graham W Taylor. 2018. Glimpse clouds: Human activity recognition from unstructured feature points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 469--478.Google ScholarCross Ref
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79, 1--2 (2010), 151--175.Google Scholar
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of Representations for Domain Adaptation. In Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 137--144. http://papers.nips.cc/paper/2983-analysis-of-representations-for-domain-adaptation.pdfGoogle Scholar
Linqin Cai, Xiaolin Liu, Fuli Chen, and Min Xiang. 2018. Robust human action recognition based on depth motion maps and improved convolutional neural network. Journal of Electronic Imaging 27, 5 (2018), 051218.Google ScholarCross Ref
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, 2017. IEEE, 4724--4733.Google ScholarCross Ref
Srikanth Cherla, Kaustubh Kulkarni, Amit Kale, and Viswanathan Ramasubramanian. 2008. Towards fast, view-invariant human action recognition. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1--8.Google ScholarCross Ref
Gabriela Csurka. 2017. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374 (2017).Google ScholarDigital Library
Yonghao Dang, Fuxing Yang, and Jianqin Yin. 2019. DWnet: Deep-Wide Network for 3D Action Recognition. arXiv preprint arXiv:1908.11036 (2019).Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR 2009. 248--255.Google ScholarCross Ref
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096--2030.Google ScholarDigital Library
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:stat.ML/1406.2661Google Scholar
Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding 158 (2017), 85--105.Google ScholarDigital Library
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CoRR abs/1711.09577 (2017). arXiv:1711.09577 http://arxiv.org/abs/1711.09577Google Scholar
Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2012. Recognizing actions across cameras by exploring the correlated subspace. In ECCV 2012. Springer.Google ScholarDigital Library
Kaiqi Huang, Yeying Zhang, and Tieniu Tan. [n. d.]. A discriminative model of motion and cross ratio for view-invariant action recognition. IEEE TIP 2012 21, 4 ([n. d.]).Google Scholar
Mariano Jaimez, Mohamed Souiai, Javier Gonzalez-Jimenez, and Daniel Cremers. 2015. A primal-dual framework for real-time dense RGB-D scene flow. In 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 98--104.Google ScholarCross Ref
Imran Junejo, Emilie Dexter, Ivan Laptev, and Patrick Perez. 2011. View-independent action recognition from temporal self-similarities. IEEE PAMI 2011 (2011).Google ScholarDigital Library
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, et al. 2017. The kinetics human action video dataset. arXiv preprint (2017).Google Scholar
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.Google ScholarCross Ref
Yu Kong, Zhengming Ding, Jun Li, and Yun Fu. 2017. Deeply Learned View-Invariant Features for Cross-View Action Recognition. IEEE Trans. Image Processing 2017 26, 6 (2017), 3028--3037. Google ScholarDigital Library
Hilde Kuehne, Ali Arslan, and Thomas Serre. 2014. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In Proceedings of the IEEE conference on computer vision and pattern recognition. 780--787.Google ScholarDigital Library
Binlong Li, Octavia I Camps, and Mario Sznaier. 2012. Cross-view activity recognition using hankelets. In IEEE CVPR 2012. IEEE.Google Scholar
Junnan Li, Yongkang Wong, Qi Zhao, and Mohan Kankanhalli. 2018. Unsupervised learning of view-invariant action representations. In Advances in Neural Information Processing Systems. 1254--1264.Google Scholar
Ruonan Li and Todd Zickler. 2012. Discriminative virtual views for cross-view action recognition. In IEEE CVPR 2012. IEEE.Google Scholar
Bin Liang and Lihong Zheng. 2015. A survey on human action recognition using depth sensors. In 2015 International conference on digital image computing: techniques and applications (DICTA). IEEE, 1--8.Google ScholarCross Ref
Duohan Liang, Guoliang Fan, Guangfeng Lin, Wanjun Chen, Xiaorong Pan, and Hong Zhu. 2019. Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarCross Ref
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.Google ScholarCross Ref
Mengyuan Liu and Junsong Yuan. 2018. Recognizing human actions as the evolution of pose estimation maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1159--1168.Google ScholarCross Ref
Elena Nicora, Gaurvi Goyal, Nicoletta Noceti, and Francesca Odone. 2019. The Effects of Data Sources: A Baseline Evaluation of the MoCA Dataset. In International Conference on Image Analysis and Processing. Springer, 544--555.Google Scholar
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:cs.LG/1511.06434Google Scholar
Hossein Rahmani, Ajmal Mian, and Mubarak Shah. 2018. Learning a deep model for human action recognition from novel viewpoints. IEEE PAMI 2018 40, 3 (2018), 667--681.Google ScholarCross Ref
Hossein Rahmani and Ajmal S. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In IEEE CVPR 2015. 2458--2466. Google ScholarCross Ref
Grégory Rogez, José Jesús Guerrero, and Carlos Orrite. 2007. View-invariant human feature extraction for video-surveillance applications. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance. IEEE, 324--329.Google ScholarDigital Library
Myung-Cheol Roh, Ho-Keun Shin, and Seong-Whan Lee. 2010. View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognition Letters 31, 7 (2010), 639--647.Google ScholarDigital Library
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12026--12035.Google ScholarCross Ref
Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. 2018. Improving the Generalization of Adversarial Training with Domain Adaptation. arXiv:cs.LG/1810.00740Google Scholar
Tanveer Syeda-Mahmood, A Vasilescu, and Saratendu Sethi. 2001. Recognizing action events from multiple viewpoints. In Detection and Recognition of Events in Video, 2001. IEEE.Google ScholarCross Ref
Dongang Wang, Wanli Ouyang, Wen Li, and Dong Xu. 2018. Dividing and aggregating network for multi-view action recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 451--467.Google ScholarDigital Library
Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In CVPR. IEEE.Google Scholar
Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. CVPR (2014).Google Scholar
Mei Wang and Weihong Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312 (2018), 135--153.Google ScholarDigital Library
Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer vision and image understanding 2006 104, 2--3 (2006), 249--257.Google Scholar
Xinxiao Wu and Yunde Jia. 2012. View-invariant action recognition using latent kernelized structural SVM. In ECCV 2012. Springer.Google ScholarDigital Library
Guangyou Xu and Feiyue Huang. 2007. Viewpoint insensitive action recognition using envelop shape. In Asian Conference on Computer Vision. Springer, 477--486.Google Scholar
Guangle Yao, Tao Lei, and Jiandan Zhong. 2019. A review of Convolutional-Neural-Network-based action recognition. Pattern Recognition Letters 118 (2019), 14--22.Google ScholarCross Ref
Alper Yilmaz and Mubarak Shah. 2005. Recognizing human actions in videos acquired by uncalibrated moving cameras. In ICCV 2005. IEEE.Google ScholarDigital Library
Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L1 optical flow. In Joint Pattern Recognition Symposium 2017. Springer, 214--223.Google ScholarCross Ref
Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, and Hanli Wang. 2016. Real-Time Action Recognition with Enhanced Motion Vector CNNs. In IEEE CVPR 2016. 2718--2726. Google ScholarCross Ref
Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi. 2013. Cross-view action recognition via a continuous virtual path. In IEEE CVPR 2013.Google ScholarDigital Library
Jingjing Zheng and Zhuolin Jiang. 2013. Learning view-invariant sparse representations for cross-view action recognition. In IEEE ICCV 2013.Google ScholarDigital Library
Jingjing Zheng, Zhuolin Jiang, P Jonathon Phillips, and Rama Chellappa. 2012. Cross-View Action Recognition via a Transferable Dictionary Pair.. In BMVC 2012.Google ScholarCross Ref

Recommendations

Learning View-Invariant Sparse Representations for Cross-View Action Recognition
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

We present an approach to jointly learn a set of view-specific dictionaries and a common dictionary for cross-view action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across ...
Read More
Cross-View Action Recognition by Projection-Based Augmentation
Image and Video Technology
Abstract
Challenging issue in cross-view action recognition is the difference between training viewpoint and testing viewpoint. Existing research deals with this problem by transferring knowledge, i.e., finding a viewpoint independent latent space in which ...
Read More
A simple multiple-fold correlation-based multi-view multi-label learning
Abstract
Correlations among different features and labels are ubiquitous in the present multi-view multi-label data sets and they are always described with within-view, cross-view, and consensus-view representations. While how to discover and measure these ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
March 2021
2075 pages
ISBN:9781450381048
DOI:10.1145/3412841
Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
action recognition
adversarial training
cross-view
deep learning
domain adaptation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 65
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adversarial feature refinement for cross-view action recognition

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Recommendations

Learning View-Invariant Sparse Representations for Cross-View Action Recognition

Cross-View Action Recognition by Projection-Based Augmentation

A simple multiple-fold correlation-based multi-view multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adversarial feature refinement for cross-view action recognition

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Recommendations

Learning View-Invariant Sparse Representations for Cross-View Action Recognition

Cross-View Action Recognition by Projection-Based Augmentation

A simple multiple-fold correlation-based multi-view multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media