ABSTRACT
Due to the randomness and non-periodic nature of the future posture of the human body, the prediction of the posture of the human body has always been a very challenging task. In the latest research, graph convolution is proved to be an effective method to capture the dynamic relationship between the human body posture joints, which is helpful for the human body posture prediction. Moreover, graph convolution can abstract the pose of the human body to obtain a multi-scale pose set. As the level of abstraction increases, the posture movement will become more stable. Although the average prediction accuracy has improved significantly in recent years, there is still much room for exploration in the application of graph convolution in pose prediction. In this work, we propose a new multi-scale feature suppression attention map convolutional network (AZY-GCN) for end-to-end human pose prediction tasks. We use GCN to extract features from the fine-grained scale to the coarse-grained scale and then from the coarse-grained scale to the fine-grained scale. Then we combine and decode the extracted features at each scale to obtain the residual between the input and the target pose. We also performed intermediate supervision on all predicted poses so that the network can learn more representative features. In addition, we also propose a new feature suppression attention module (FISA-block), which can effectively extract relevant information from neighboring nodes while suppressing poor GCN learning noise. Our proposed method was evaluated on the public data sets of Human3.6M and CMU Mocap. After a large number of experiments, it is shown that our method has achieved relatively advanced performance.
- Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen,In Proceedings of the European Conference on Computer Vision, Pages 226 – 242. Springer, 2020. 2Google Scholar
- Dario Pavllo, Christoph Feichtenhofer, David Grangier,and Michael Auli. 3d human pose estimation in video with temporal convolutions and semisupervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 7753 – 7762, 2019. 1, 6, 7, 0Google Scholar
- Yongxin Wang, Kris Kitani, And Xinshuo Weng. Joint Object Detection with Graph neural networks. ArXiv PrePrint arXiv:2006.13164, 2020Google Scholar
- Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang,And Qi Tian. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based on Human Motion Prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Pages 214 – 223,2020. 1, 2, 3, 5, 6,7Google ScholarCross Ref
- Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE International Conference on Computer Vision,Pages 9489 – 9497, 2019. 1, 2, 3,5, 6, 7Google ScholarCross Ref
- Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2016. 3Google Scholar
- Ashraful Islam, Chengjiang Long, Arslan Basharat, and Anthony Hoogs. Doa-gan:Dual-order attentive generative adversarial network for image copy-move forgery detection and localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 2Google Scholar
- Chengjiang Long, Gang Hua, and Ashish Kapoor. A joint gaussian process model for active visual recognition with expertise estimation in crowdsourcing. International Journal of Computer Vision,116 (2) : 136-160, 2016. 2Google ScholarCross Ref
- Qiuhong Ke, Mohammed Bennamoun, Hossein Rahmani,Senjian An, Ferdous Sohel,And Farid Boussaid.Learning Latent Global Network for skeleton based Action Prediction.IEEE Transactions on Image Processing, 29:959 – 970, 2019.1, 2Google Scholar
- Julieta Martinez, Rayat Hossain, Javier Romero,A simple yet effective Baseline for 3D human Pose estimation. Proceedings of the IEEE International Conference on Computer Vision, Pages 2640 – 2649, 2017. 1, 3, 5, 6, 7, 0, 2Google ScholarCross Ref
- Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza,and Jocelyn Chanussot. Graph convolutional networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2020. 2Google Scholar
- Tao Hu, Chengjiang Long, Chunxia Xiao. Novel Visual representation using Diverse Conditional Gan for Visual recognition. IEEE Transactions on Image Processing, 30:3499-3512, 2021Google ScholarDigital Library
- Qiongjie Cui, Huaijiang Sun, and Fei Yang. Learning dynamic relationships for 3d human motion prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 6519 – 6527, 2020. 1, 2, 3Google ScholarCross Ref
- Andreas M Lehrmann, Peter V Gehler, and Sebastian Nowozin. Efficient nonlinear markov models for human motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 1314-1321, 2014. 1Google ScholarDigital Library
- Tianhang Zheng, Sheng Liu, Changyou Chen, Junsong Yuan, Baochun Li,Kui Ren. Towards Understanding the Adversarial vulnerability of resilience of skeleton based action Recognition. ArXiv Preprint arXiv:2005.07151, 2020. 1, 2Google Scholar
- Xikun Zhang, Chang Xu, and Dacheng Tao. Context aware graph convolution for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Pages 14333-14342, 2020. 1Google ScholarCross Ref
- Lei Shi, Yifan Zhang, Jian Cheng,and Hanqing Lu. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2019. 1Google ScholarCross Ref
- Xiaodan Liang, Yuxiong Wang, Xiaodan Liang, And Jose MF Moura. Adversarial Geometry - Aware Human Motion Prediction. Proceedings of the European Conference on Computer Vision, Pages 786 – 803,2018. 1, 2Google Scholar
- Dong Wang, Yuan Yuan, And Qi Wang. Early action prediction with generative adversarial networks. IEEE Transactions on Neural Networks, 2019. 1, 2Google Scholar
- Emre Aksan, Peng Cao, Manuel Kaufmann, And Otmar Hilliges. Spatio-temporal Transformer for 3D Human Motion Prediction. ArXiv E-Prints, Pages arXiv – 2004, 2020Google Scholar
- Gang Hua, Chengjiang Long, Ming Yang,Yan Gao. Collaborative Active Learning of a kernel Machine ensemble for Recognition. In Proceedings of the IEEE International Conference on Computer Vision, Pages 1209 – 1216, 2013Google ScholarDigital Library
- Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, and Gang Hua. Sgcn:Sparse graph convolution for pedestrian trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 2Google Scholar
- Chengjiang Long and Gang Hua. Multi-class multiannotator active learning with robust gaussian process for visual recognition. In Proceedings of the IEEE International Conference on ComputerVision, 2015. 2Google Scholar
- Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision,Pages 4346 – 4354, 2015. 1, 2Google ScholarCross Ref
- Hao Yang, Chunfeng Yuan, Li Zhang, Yunda Sun, Weiming Hu, and Stephen J Maybank. Sta-cnn:Convolutional temporal attention learning for action recognition. IEEE Transactions on Image Processing, 29:5783 – 5793, 2020. 1,2Google Scholar
- Wensong Chan, Zhiqiang Tian, and Yang Wu. Gasgcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors, 20(12):3499, 2020. 1, 2Google ScholarCross Ref
- Alejandro Hernandez, Jurgen Gall, and Francesc MorenoNoguer. Human motion prediction via spatio-temporal inpainting. In Proceedings of the IEEE International Conference on Computer Vision,Pages 7134 – 7143, 2019. 1, 2Google ScholarCross Ref
- Hai-Feng Sang, Zi-Zhen Chen, Da-Kuo He. Human Motion prediction based on attentional mechanism. Multimedia Tools and Applications, 79(9):5529 – 5544, 2020Google ScholarCross Ref
- Fanjia Li, Aichun Zhu, Yonggang Xu, Ran Cui, Gang Hua. Multi-stream and Enhanced spatial-temporal graph convolution for skeleton based action recognition. IEEE Transactions on Pattern Recognition, 8:97757 – 97770,1 2020.Google Scholar
- Chengjiang Long and Gang Hua. Correlational gaussian processes for cross-domain visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2Google ScholarCross Ref
- Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, and Huaping Liub. Trajectorycnn:A spatio-temporal Feature Learning Network for Human Motion Prediction. IEEE Transactions on Circuits and Systems for Video Technology, 2020.1,2Google Scholar
- Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, and Stephen Lin. Srnet:Generalization in 3D human pose estimation with a split and recombine approach. ArXiv Preprint arXiv:2007.09389, 2020. 1, 2, 3, 5, 6, 7, 0Google Scholar
- Julieta Martinez, Michael J Black,And Javier Romero. On Human Motion Prediction In Recurrent neural Networks. Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pages 2891 – 2900, 2017. 1, 2, 5, 6, 7Google ScholarCross Ref
- Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m:Large scale datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325 – 1339, 2013. 5Google ScholarDigital Library
- Luyang Wang, Yan Chen, Zhenhua Guo, Keyuan Qian, Mude Lin, Hongsheng Li, And Jimmy S Ren. Generalizing monocular 3D human pose estimation in the wild. ArXiv Preprint arXiv: 194.05512, 2019Google Scholar
- Sungheon Park and Nojun Kwak. A Relational Analysis of Relational Networks. ArXiv Preprint arXiv:1805.08961, 2018Google Scholar
- Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng,and Jiaying Liu. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017. 1, 2Google ScholarCross Ref
- Amal Fahad Al-Aqel and Murtaza Ali Khan. A Study on human Motion Prediction based on fuzzy neural network. Proceedings of 2020 3rd International Conference on Computer Applications & Information Security, Pages 1 – 6.2020. 1, 2Google Scholar
- Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu,and Song-Chun Zhu. Learning pose grammar to encode human body configuration for 3d pose estimation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 1Google ScholarCross Ref
- Gang Hua, Chengjiang Long, Ming Yang, and Yan Gao. Collaborative active visual recognition from crowds:IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):582 – 594, 2018. 2Google ScholarCross Ref
- Li Zhang, Yan Ge, Zhang Zhang, Haiping Lu. The Neural Networks of Hop-hop Relation-aware Graph. ArXiv Preprint arXiv:2012.11147, 2020Google Scholar
- Jogendra Nath Kundu, Maharshi Gor, and R Venkatesh Babu. Bihmp-gan:Bidirectional 3D Human Motion Prediction Gan. In Proceedings of the AAAI Conference on Artificial Intelligence, Volume 33, Pages 8553 – 8560, 2019. 1, 2Google Scholar
- Hai Ci, Chunyu Wang, Xiaoxuan Ma, and Yizhou Wang. Optimizing network structure for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision,Pages 2262 – 2271, 2019. 1, 2, 3, 4, 5, 6, 7, 8, 0Google ScholarCross Ref
- Zhiming Zou, Kenkun Liu, Le Wang, and Wei Tang. High-order graph convolutional networks for 3d human pose estimation. BMVC, 2020. 1, 2, 3, 6Google Scholar
- Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia,Convolutional graph networks for 3D Human Pose Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Pages 3425 – 3435, 2019. 1,Two, three, five, six, zeroGoogle Scholar
Index Terms
- AZY-GCN: Multi-scale feature suppression attentional diagram convolutional network for human pose prediction
Recommendations
Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction
MM '23: Proceedings of the 31st ACM International Conference on MultimediaWith potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and also has achieved high success, especially using the recent Graph Convolutional ...
Pose graph parsing network for human-object interaction detection
Graphical abstractWe construct a multibranch network to study high-level semantic features. In addition to emphasizing the appearance area of each instance in an image, feature propagation based on a pose graph is further adopted to consider ...
Highlights- We utilize GCN to obtain the fine-grained correlation features between body parts.
AbstractThe detection of interactions between humans and objects is one of the core issues in the area of scene understanding in image analysis. The conventional method is to pair the human body with the object as an entity and pay attention ...
Scale-Aware Network with Attentional Selection for Human Pose Estimation
Human Centered ComputingAbstractHuman pose estimation is a fundamental yet challenging task in computer vision. Human pose estimation from a single image is a challenging problem due to the limited information of 2D images and the large variations in configuration and appearance ...
Comments