skip to main content
10.1145/3412841.3441981acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Adversarial feature refinement for cross-view action recognition

Published:22 April 2021Publication History

ABSTRACT

Apparent motion information of an action may vary dramatically from one view to another, making transfer of knowledge across views a core challenge of action recognition. Recent times have seen the use of large scale datasets to compensate for this lack in generalization, and in fact most state-of-the-art methods today require large amounts of training data and have high computational cost while training. We propose a novel technique leveraging pre-trained features refined to minimize the view-related information through adversarial training inspired by domain adaptation methods. Our method is able to recognize actions from unfamiliar viewpoints and works effectively on substantially less training data than the ones necessary to train state-of-the-art cross-view methods with exceptional results.

References

  1. Firoj Alam, Shafiq Joty, and Muhammad Imran. 2018. Domain Adaptation with Adversarial Training and Graph Embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1077--1087. Google ScholarGoogle ScholarCross RefCross Ref
  2. FabienBaradel, Christian Wolf, and Julien Mille. 2017. Human action recognition: Pose-based attention draws focus to hands. In Proceedings of the IEEE International Conference on Computer Vision. 604--613.Google ScholarGoogle Scholar
  3. Fabien Baradel, Christian Wolf, Julien Mille, and Graham W Taylor. 2018. Glimpse clouds: Human activity recognition from unstructured feature points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 469--478.Google ScholarGoogle ScholarCross RefCross Ref
  4. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79, 1--2 (2010), 151--175.Google ScholarGoogle Scholar
  5. Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of Representations for Domain Adaptation. In Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman (Eds.). MIT Press, 137--144. http://papers.nips.cc/paper/2983-analysis-of-representations-for-domain-adaptation.pdfGoogle ScholarGoogle Scholar
  6. Linqin Cai, Xiaolin Liu, Fuli Chen, and Min Xiang. 2018. Robust human action recognition based on depth motion maps and improved convolutional neural network. Journal of Electronic Imaging 27, 5 (2018), 051218.Google ScholarGoogle ScholarCross RefCross Ref
  7. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, 2017. IEEE, 4724--4733.Google ScholarGoogle ScholarCross RefCross Ref
  8. Srikanth Cherla, Kaustubh Kulkarni, Amit Kale, and Viswanathan Ramasubramanian. 2008. Towards fast, view-invariant human action recognition. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gabriela Csurka. 2017. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yonghao Dang, Fuxing Yang, and Jianqin Yin. 2019. DWnet: Deep-Wide Network for 3D Action Recognition. arXiv preprint arXiv:1908.11036 (2019).Google ScholarGoogle Scholar
  11. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR 2009. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096--2030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:stat.ML/1406.2661Google ScholarGoogle Scholar
  14. Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding 158 (2017), 85--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? CoRR abs/1711.09577 (2017). arXiv:1711.09577 http://arxiv.org/abs/1711.09577Google ScholarGoogle Scholar
  16. Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2012. Recognizing actions across cameras by exploring the correlated subspace. In ECCV 2012. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kaiqi Huang, Yeying Zhang, and Tieniu Tan. [n. d.]. A discriminative model of motion and cross ratio for view-invariant action recognition. IEEE TIP 2012 21, 4 ([n. d.]).Google ScholarGoogle Scholar
  18. Mariano Jaimez, Mohamed Souiai, Javier Gonzalez-Jimenez, and Daniel Cremers. 2015. A primal-dual framework for real-time dense RGB-D scene flow. In 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 98--104.Google ScholarGoogle ScholarCross RefCross Ref
  19. Imran Junejo, Emilie Dexter, Ivan Laptev, and Patrick Perez. 2011. View-independent action recognition from temporal self-similarities. IEEE PAMI 2011 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, et al. 2017. The kinetics human action video dataset. arXiv preprint (2017).Google ScholarGoogle Scholar
  21. Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yu Kong, Zhengming Ding, Jun Li, and Yun Fu. 2017. Deeply Learned View-Invariant Features for Cross-View Action Recognition. IEEE Trans. Image Processing 2017 26, 6 (2017), 3028--3037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hilde Kuehne, Ali Arslan, and Thomas Serre. 2014. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In Proceedings of the IEEE conference on computer vision and pattern recognition. 780--787.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Binlong Li, Octavia I Camps, and Mario Sznaier. 2012. Cross-view activity recognition using hankelets. In IEEE CVPR 2012. IEEE.Google ScholarGoogle Scholar
  25. Junnan Li, Yongkang Wong, Qi Zhao, and Mohan Kankanhalli. 2018. Unsupervised learning of view-invariant action representations. In Advances in Neural Information Processing Systems. 1254--1264.Google ScholarGoogle Scholar
  26. Ruonan Li and Todd Zickler. 2012. Discriminative virtual views for cross-view action recognition. In IEEE CVPR 2012. IEEE.Google ScholarGoogle Scholar
  27. Bin Liang and Lihong Zheng. 2015. A survey on human action recognition using depth sensors. In 2015 International conference on digital image computing: techniques and applications (DICTA). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  28. Duohan Liang, Guoliang Fan, Guangfeng Lin, Wanjun Chen, Xiaorong Pan, and Hong Zhu. 2019. Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision. Springer, 816--833.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mengyuan Liu and Junsong Yuan. 2018. Recognizing human actions as the evolution of pose estimation maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1159--1168.Google ScholarGoogle ScholarCross RefCross Ref
  31. Elena Nicora, Gaurvi Goyal, Nicoletta Noceti, and Francesca Odone. 2019. The Effects of Data Sources: A Baseline Evaluation of the MoCA Dataset. In International Conference on Image Analysis and Processing. Springer, 544--555.Google ScholarGoogle Scholar
  32. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:cs.LG/1511.06434Google ScholarGoogle Scholar
  33. Hossein Rahmani, Ajmal Mian, and Mubarak Shah. 2018. Learning a deep model for human action recognition from novel viewpoints. IEEE PAMI 2018 40, 3 (2018), 667--681.Google ScholarGoogle ScholarCross RefCross Ref
  34. Hossein Rahmani and Ajmal S. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In IEEE CVPR 2015. 2458--2466. Google ScholarGoogle ScholarCross RefCross Ref
  35. Grégory Rogez, José Jesús Guerrero, and Carlos Orrite. 2007. View-invariant human feature extraction for video-surveillance applications. In 2007 IEEE Conference on Advanced Video and Signal Based Surveillance. IEEE, 324--329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Myung-Cheol Roh, Ho-Keun Shin, and Seong-Whan Lee. 2010. View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognition Letters 31, 7 (2010), 639--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  38. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12026--12035.Google ScholarGoogle ScholarCross RefCross Ref
  39. Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. 2018. Improving the Generalization of Adversarial Training with Domain Adaptation. arXiv:cs.LG/1810.00740Google ScholarGoogle Scholar
  40. Tanveer Syeda-Mahmood, A Vasilescu, and Saratendu Sethi. 2001. Recognizing action events from multiple viewpoints. In Detection and Recognition of Events in Video, 2001. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  41. Dongang Wang, Wanli Ouyang, Wen Li, and Dong Xu. 2018. Dividing and aggregating network for multi-view action recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 451--467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In CVPR. IEEE.Google ScholarGoogle Scholar
  43. Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. CVPR (2014).Google ScholarGoogle Scholar
  44. Mei Wang and Weihong Deng. 2018. Deep visual domain adaptation: A survey. Neurocomputing 312 (2018), 135--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer vision and image understanding 2006 104, 2--3 (2006), 249--257.Google ScholarGoogle Scholar
  46. Xinxiao Wu and Yunde Jia. 2012. View-invariant action recognition using latent kernelized structural SVM. In ECCV 2012. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Guangyou Xu and Feiyue Huang. 2007. Viewpoint insensitive action recognition using envelop shape. In Asian Conference on Computer Vision. Springer, 477--486.Google ScholarGoogle Scholar
  48. Guangle Yao, Tao Lei, and Jiandan Zhong. 2019. A review of Convolutional-Neural-Network-based action recognition. Pattern Recognition Letters 118 (2019), 14--22.Google ScholarGoogle ScholarCross RefCross Ref
  49. Alper Yilmaz and Mubarak Shah. 2005. Recognizing human actions in videos acquired by uncalibrated moving cameras. In ICCV 2005. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L1 optical flow. In Joint Pattern Recognition Symposium 2017. Springer, 214--223.Google ScholarGoogle ScholarCross RefCross Ref
  51. Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, and Hanli Wang. 2016. Real-Time Action Recognition with Enhanced Motion Vector CNNs. In IEEE CVPR 2016. 2718--2726. Google ScholarGoogle ScholarCross RefCross Ref
  52. Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi. 2013. Cross-view action recognition via a continuous virtual path. In IEEE CVPR 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jingjing Zheng and Zhuolin Jiang. 2013. Learning view-invariant sparse representations for cross-view action recognition. In IEEE ICCV 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jingjing Zheng, Zhuolin Jiang, P Jonathon Phillips, and Rama Chellappa. 2012. Cross-View Action Recognition via a Transferable Dictionary Pair.. In BMVC 2012.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
    March 2021
    2075 pages
    ISBN:9781450381048
    DOI:10.1145/3412841

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 April 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,650of6,669submissions,25%
  • Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader