Abstract
We present a rank statistic adaptive multi-stage pruning method to find lightweight neural networks for 3D human mesh recovery while minimizing accuracy drop. We observe that some feature maps often have prominent low-rank patterns regardless of input human images. Furthermore, even after pruning, feature channels that should have been pruned according to pruning criteria frequently re-appear in test time. From these observations, we design rank statistic adaptive multi-stage pruning; thereby, we can prune more filters with recovering mesh reconstruction accuracy. We demonstrate that, for DenseNet-121, 60.0% of parameters and 67.9% of FLOPs are saved while maintaining comparable accuracy to that of the original full model. This is a notable improvement compared to the competing method based on the L1 filter pruning, where the error is increased by 17.55% at the same pruning rate.
Similar content being viewed by others
Notes
We abuse the notion as the parameter and the projection function interchangeably as in [13].
References
Andriluka, M., Pishchulin, L., Gehler, P. V., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014)
Cho, J., Youwang, K., Oh, T.-H.: Cross-attention of disentangled modalities for 3d human mesh recovery with transformers. In: European Conference on Computer Vision (ECCV). Springer, (2022)
Han, S., Pool, J., Tran, J., Dally, W. J.: Learning both weights and connections for efficient neural networks. In: International Conference on Learning Representations (ICLR), (2015)
Hinton, G. E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531, (2015)
Huang, G., Liu, Z., Weinberger, K. Q.: Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017)
Hyeon-Woo, N., Ye-Bin, M., Oh, T.-H.: Fedpara: Low-rank hadamard product for communication-efficient federated learning. In: International Conference on Learning Representations (ICLR), (2022)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36:1325–1339, (2014)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In:British Machine Vision Conference (BMVC), (2010)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In:IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2011)
Kanazawa, A., Black, M. J., Jacobs, D. W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018)
Kim, Y., Park, J., Jang, Y., Ali, M., Oh, T.-H., Bae, S.-H.: Distilling global and local logits with densely connected relations. In: IEEE International Conference on Computer Vision (ICCV), (2021)
Kocabas, M., Athanasiou, N., Black, M. J.: Vibe: Video inference for human body pose and shape estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June (2020)
Kolotouros, N., Pavlakos, G., Black, M. J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: IEEE International Conference on Computer Vision (ICCV), (2019)
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342, (2018)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H. P.: Pruning filters for efficient convnets. In: International Conference on Learning Representations (ICLR), (2017)
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In:IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2021)
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In:IEEE International Conference on Computer Vision (ICCV), (2021)
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: Hrank: Filter pruning using high-rank feature map. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2020)
Lin, T.-Y., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C. L.: Microsoft coco: Common objects in context. In:European Conference on Computer Vision (ECCV), (2014)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M. J.: SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. (2015)
Mehta, D., Rhodin, H., Casas,D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. 2017 International Conference on 3D Vision (3DV), (2017)
Mitsuno, K., Kurita, T.: Filter pruning using hierarchical group sparse regularization for deep convolutional neural networks. 2020 25th International Conference on Pattern Recognition (ICPR), pages 1089–1095, (2021)
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. In:International Conference on Learning Representations (ICLR), (2017)
Oh, T.-H., Matsushita, Y., Tai, Y.-W., Kweon, I.S.: Fast randomized singular value thresholding for low-rank optimization. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(2), 376–391 (2018)
Renda, A., Frankle, J., Carbin, M.: Comparing rewinding and fine-tuning in neural network pruning. In:International Conference on Learning Representations (ICLR), (2020)
Tan, C. M. J., Motani, M.: Dropnet: Reducing neural network complexity via iterative pruning. In :International Conference on Machine Learning (ICML), (2020)
Tu, C.-H., Lee, J.-H., Chan, Y.-M., Chen, C.-S.: Pruning depthwise separable convolutions for mobilenet compression. In :2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, (2020)
von Marcard, T., Henschel, R., Black, M. J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using imus and a moving camera. In :European Conference on Computer Vision (ECCV), (2018)
Youwang, K., Ji-Yeon, K., Joo, K., Oh, T.-H.: Unified 3d mesh recovery of humans and animals by learning animal exercise. British Machine Vision Conference (BMVC), (2021)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: Tilt: Transform invariant low-rank textures. Int. J. Comput. Vis. (IJCV) 99(1), 1–24 (2012)
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression, (2018)
Acknowledgements
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2022-00164860, Development of human digital twin technology based on dynamic behavior modeling and human-object-space interaction; No. 2022-0-00290, Visual Intelligence for Space-Time Understanding and Generation based on Multi-layered Visual Common Sense; No. 2019-0-01906, Artificial Intelligence Graduate School Program (POSTECH)). This research was results of a study on the “HPC Support” Project, supported by the MSIT and NIPA.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical standards
The authors ensure objectivity and transparency in research and ensure that accepted principles of ethical and professional conduct have been followed.
Research involving Human Participants and/or Animals
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ryou, D., Youwang, K. & Oh, TH. Multi-stage adaptive rank statistic pruning for lightweight human 3D mesh recovery model. Vis Comput 40, 535–543 (2024). https://doi.org/10.1007/s00371-023-02798-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02798-x