skip to main content
10.1145/3507548.3507565acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaiConference Proceedingsconference-collections
research-article

AZY-GCN: Multi-scale feature suppression attentional diagram convolutional network for human pose prediction

Authors Info & Claims
Published:09 March 2022Publication History

ABSTRACT

Due to the randomness and non-periodic nature of the future posture of the human body, the prediction of the posture of the human body has always been a very challenging task. In the latest research, graph convolution is proved to be an effective method to capture the dynamic relationship between the human body posture joints, which is helpful for the human body posture prediction. Moreover, graph convolution can abstract the pose of the human body to obtain a multi-scale pose set. As the level of abstraction increases, the posture movement will become more stable. Although the average prediction accuracy has improved significantly in recent years, there is still much room for exploration in the application of graph convolution in pose prediction. In this work, we propose a new multi-scale feature suppression attention map convolutional network (AZY-GCN) for end-to-end human pose prediction tasks. We use GCN to extract features from the fine-grained scale to the coarse-grained scale and then from the coarse-grained scale to the fine-grained scale. Then we combine and decode the extracted features at each scale to obtain the residual between the input and the target pose. We also performed intermediate supervision on all predicted poses so that the network can learn more representative features. In addition, we also propose a new feature suppression attention module (FISA-block), which can effectively extract relevant information from neighboring nodes while suppressing poor GCN learning noise. Our proposed method was evaluated on the public data sets of Human3.6M and CMU Mocap. After a large number of experiments, it is shown that our method has achieved relatively advanced performance.

References

  1. Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen,In Proceedings of the European Conference on Computer Vision, Pages 226 – 242. Springer, 2020. 2Google ScholarGoogle Scholar
  2. Dario Pavllo, Christoph Feichtenhofer, David Grangier,and Michael Auli. 3d human pose estimation in video with temporal convolutions and semisupervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 7753 – 7762, 2019. 1, 6, 7, 0Google ScholarGoogle Scholar
  3. Yongxin Wang, Kris Kitani, And Xinshuo Weng. Joint Object Detection with Graph neural networks. ArXiv PrePrint arXiv:2006.13164, 2020Google ScholarGoogle Scholar
  4. Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang,And Qi Tian. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based on Human Motion Prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Pages 214 – 223,2020. 1, 2, 3, 5, 6,7Google ScholarGoogle ScholarCross RefCross Ref
  5. Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE International Conference on Computer Vision,Pages 9489 – 9497, 2019. 1, 2, 3,5, 6, 7Google ScholarGoogle ScholarCross RefCross Ref
  6. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2016. 3Google ScholarGoogle Scholar
  7. Ashraful Islam, Chengjiang Long, Arslan Basharat, and Anthony Hoogs. Doa-gan:Dual-order attentive generative adversarial network for image copy-move forgery detection and localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 2Google ScholarGoogle Scholar
  8. Chengjiang Long, Gang Hua, and Ashish Kapoor. A joint gaussian process model for active visual recognition with expertise estimation in crowdsourcing. International Journal of Computer Vision,116 (2) : 136-160, 2016. 2Google ScholarGoogle ScholarCross RefCross Ref
  9. Qiuhong Ke, Mohammed Bennamoun, Hossein Rahmani,Senjian An, Ferdous Sohel,And Farid Boussaid.Learning Latent Global Network for skeleton based Action Prediction.IEEE Transactions on Image Processing, 29:959 – 970, 2019.1, 2Google ScholarGoogle Scholar
  10. Julieta Martinez, Rayat Hossain, Javier Romero,A simple yet effective Baseline for 3D human Pose estimation. Proceedings of the IEEE International Conference on Computer Vision, Pages 2640 – 2649, 2017. 1, 3, 5, 6, 7, 0, 2Google ScholarGoogle ScholarCross RefCross Ref
  11. Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza,and Jocelyn Chanussot. Graph convolutional networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2020. 2Google ScholarGoogle Scholar
  12. Tao Hu, Chengjiang Long, Chunxia Xiao. Novel Visual representation using Diverse Conditional Gan for Visual recognition. IEEE Transactions on Image Processing, 30:3499-3512, 2021Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qiongjie Cui, Huaijiang Sun, and Fei Yang. Learning dynamic relationships for 3d human motion prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 6519 – 6527, 2020. 1, 2, 3Google ScholarGoogle ScholarCross RefCross Ref
  14. Andreas M Lehrmann, Peter V Gehler, and Sebastian Nowozin. Efficient nonlinear markov models for human motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 1314-1321, 2014. 1Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tianhang Zheng, Sheng Liu, Changyou Chen, Junsong Yuan, Baochun Li,Kui Ren. Towards Understanding the Adversarial vulnerability of resilience of skeleton based action Recognition. ArXiv Preprint arXiv:2005.07151, 2020. 1, 2Google ScholarGoogle Scholar
  16. Xikun Zhang, Chang Xu, and Dacheng Tao. Context aware graph convolution for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 14333-14342, 2020. 1Google ScholarGoogle ScholarCross RefCross Ref
  17. Lei Shi, Yifan Zhang, Jian Cheng,and Hanqing Lu. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2019. 1Google ScholarGoogle ScholarCross RefCross Ref
  18. Xiaodan Liang, Yuxiong Wang, Xiaodan Liang, And Jose MF Moura. Adversarial Geometry - Aware Human Motion Prediction. Proceedings of the European Conference on Computer Vision, Pages 786 – 803,2018. 1, 2Google ScholarGoogle Scholar
  19. Dong Wang, Yuan Yuan, And Qi Wang. Early action prediction with generative adversarial networks. IEEE Transactions on Neural Networks, 2019. 1, 2Google ScholarGoogle Scholar
  20. Emre Aksan, Peng Cao, Manuel Kaufmann, And Otmar Hilliges. Spatio-temporal Transformer for 3D Human Motion Prediction. ArXiv E-Prints, Pages arXiv – 2004, 2020Google ScholarGoogle Scholar
  21. Gang Hua, Chengjiang Long, Ming Yang,Yan Gao. Collaborative Active Learning of a kernel Machine ensemble for Recognition. In Proceedings of the IEEE International Conference on Computer Vision, Pages 1209 – 1216, 2013Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, and Gang Hua. Sgcn:Sparse graph convolution for pedestrian trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 2Google ScholarGoogle Scholar
  23. Chengjiang Long and Gang Hua. Multi-class multiannotator active learning with robust gaussian process for visual recognition. In Proceedings of the IEEE International Conference on ComputerVision, 2015. 2Google ScholarGoogle Scholar
  24. Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision,Pages 4346 – 4354, 2015. 1, 2Google ScholarGoogle ScholarCross RefCross Ref
  25. Hao Yang, Chunfeng Yuan, Li Zhang, Yunda Sun, Weiming Hu, and Stephen J Maybank. Sta-cnn:Convolutional temporal attention learning for action recognition. IEEE Transactions on Image Processing, 29:5783 – 5793, 2020. 1,2Google ScholarGoogle Scholar
  26. Wensong Chan, Zhiqiang Tian, and Yang Wu. Gasgcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors, 20(12):3499, 2020. 1, 2Google ScholarGoogle ScholarCross RefCross Ref
  27. Alejandro Hernandez, Jurgen Gall, and Francesc MorenoNoguer. Human motion prediction via spatio-temporal inpainting. In Proceedings of the IEEE International Conference on Computer Vision,Pages 7134 – 7143, 2019. 1, 2Google ScholarGoogle ScholarCross RefCross Ref
  28. Hai-Feng Sang, Zi-Zhen Chen, Da-Kuo He. Human Motion prediction based on attentional mechanism. Multimedia Tools and Applications, 79(9):5529 – 5544, 2020Google ScholarGoogle ScholarCross RefCross Ref
  29. Fanjia Li, Aichun Zhu, Yonggang Xu, Ran Cui, Gang Hua. Multi-stream and Enhanced spatial-temporal graph convolution for skeleton based action recognition. IEEE Transactions on Pattern Recognition, 8:97757 – 97770,1 2020.Google ScholarGoogle Scholar
  30. Chengjiang Long and Gang Hua. Correlational gaussian processes for cross-domain visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2Google ScholarGoogle ScholarCross RefCross Ref
  31. Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, and Huaping Liub. Trajectorycnn:A spatio-temporal Feature Learning Network for Human Motion Prediction. IEEE Transactions on Circuits and Systems for Video Technology, 2020.1,2Google ScholarGoogle Scholar
  32. Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, and Stephen Lin. Srnet:Generalization in 3D human pose estimation with a split and recombine approach. ArXiv Preprint arXiv:2007.09389, 2020. 1, 2, 3, 5, 6, 7, 0Google ScholarGoogle Scholar
  33. Julieta Martinez, Michael J Black,And Javier Romero. On Human Motion Prediction In Recurrent neural Networks. Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pages 2891 – 2900, 2017. 1, 2, 5, 6, 7Google ScholarGoogle ScholarCross RefCross Ref
  34. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m:Large scale datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325 – 1339, 2013. 5Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, And Jimmy S Ren. Generalizing monocular 3D human pose estimation in the wild. ArXiv Preprint arXiv: 194.05512, 2019Google ScholarGoogle Scholar
  36. Sungheon Park and Nojun Kwak. A Relational Analysis of Relational Networks. ArXiv Preprint arXiv:1805.08961, 2018Google ScholarGoogle Scholar
  37. Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng,and Jiaying Liu. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017. 1, 2Google ScholarGoogle ScholarCross RefCross Ref
  38. Amal Fahad Al-Aqel and Murtaza Ali Khan. A Study on human Motion Prediction based on fuzzy neural network. Proceedings of 2020 3rd International Conference on Computer Applications & Information Security, Pages 1 – 6.2020. 1, 2Google ScholarGoogle Scholar
  39. Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu,and Song-Chun Zhu. Learning pose grammar to encode human body configuration for 3d pose estimation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 1Google ScholarGoogle ScholarCross RefCross Ref
  40. Gang Hua, Chengjiang Long, Ming Yang, and Yan Gao. Collaborative active visual recognition from crowds:IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):582 – 594, 2018. 2Google ScholarGoogle ScholarCross RefCross Ref
  41. Li Zhang, Yan Ge, Zhang Zhang, Haiping Lu. The Neural Networks of Hop-hop Relation-aware Graph. ArXiv Preprint arXiv:2012.11147, 2020Google ScholarGoogle Scholar
  42. Jogendra Nath Kundu, Maharshi Gor, and R Venkatesh Babu. Bihmp-gan:Bidirectional 3D Human Motion Prediction Gan. In Proceedings of the AAAI Conference on Artificial Intelligence, Volume 33, Pages 8553 – 8560, 2019. 1, 2Google ScholarGoogle Scholar
  43. Hai Ci, Chunyu Wang, Xiaoxuan Ma, and Yizhou Wang. Optimizing network structure for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision,Pages 2262 – 2271, 2019. 1, 2, 3, 4, 5, 6, 7, 8, 0Google ScholarGoogle ScholarCross RefCross Ref
  44. Zhiming Zou, Kenkun Liu, Le Wang, and Wei Tang. High-order graph convolutional networks for 3d human pose estimation. BMVC, 2020. 1, 2, 3, 6Google ScholarGoogle Scholar
  45. Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia,Convolutional graph networks for 3D Human Pose Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Pages 3425 – 3435, 2019. 1,Two, three, five, six, zeroGoogle ScholarGoogle Scholar

Index Terms

  1. AZY-GCN: Multi-scale feature suppression attentional diagram convolutional network for human pose prediction
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              CSAI '21: Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence
              December 2021
              437 pages
              ISBN:9781450384155
              DOI:10.1145/3507548

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 March 2022

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)14
              • Downloads (Last 6 weeks)1

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format