SHetConv: target keypoint detection based on heterogeneous convolution neural networks

Yin, Xiaojie; He, Ning; Liu, Xiaoxiao; Lu, Ke

doi:10.1007/s00530-020-00729-7

SHetConv: target keypoint detection based on heterogeneous convolution neural networks

Regular Paper
Published: 27 January 2021

Volume 27, pages 519–529, (2021)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Xiaojie Yin¹,
Ning He¹,
Xiaoxiao Liu¹ &
…
Ke Lu²

238 Accesses
1 Citation
Explore all metrics

Abstract

Keypoint detection is an important research topic in target recognition and classification. This paper studies the detection of keypoints in images of Amur tigers and proposes a target keypoint detection method based on heterogeneous convolution neural networks. Because of the limited storage capacity of the monitoring device and higher accuracy requirement, we propose a heterogeneous convolution called SHetConv, which is composed of group convolution and standard convolution. We use two kinds of SHetConv, one to reduce the computational costs [number of FLOPs (FLOPs stands for the floating-point operations per second .)] and one to increase the receptive field. To further improve the effectiveness of the model, we propose a feature fusion module to make full use of the semantic information and spatial information of images. We evaluate the algorithm on Tiger Pose Keypoint, CIFAR-10 and MPII datasets. The experimental results show that our method has a better accuracy, recall rate and \({F_{{1}}}\)-score than other state-of-the-art keypoint detection methods. Moreover, the number of parameters and FLOPs are substantially reduced. Specifically, the number of parameter and FLOPs of the Our (scaled network + fusion module + shet2) model are 0.14 and 0.143 times those of the big HRNet-W48 model, and its \({F_{{1}}}\)-score is increased by 0.3%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Same Size Dilated Attention Network for Keypoint Detection

GoatPose: A Lightweight and Efficient Network with Attention Mechanism

Group Residual Dense Block for Key-Point Detector with One-Level Feature

Notes

The size of convolution kernel is \(C \times {K_{1}} \times {K_{1}}\). In our paper, C is the channel of convolution kernel as well as the number of channels of the feature maps that will be convolved. Further, \({K_{1}}\) is the height and weight of the kernel.

References

Rashid, M., Gu, X. and JaeLee, Y.: Interspecies knowledge transfer for facial keypoint detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6894–6903 (2017)
Nguyen, H., Maclagan, S.J., Nguyen, T.D., Nguyen, T., Flemons, P., Andrews, K., Ritchie, E.G. , and Phung, D.: Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp. 40–49 (2017)
Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., and Schiele, B.: ‘Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp. 34–50 (2016)
Kocabas, M., Karagoz, S., and Akbas, E.: Multiposenet: Fast multi-person pose estimation using pose residual network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 417–433 (2018)
Newell, A., Huang, Z., and Deng, J.: ‘Associative embedding: End-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp. 2277–2287 (2017)
Xiao, B., Wu, H., and Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
Newell, A., Yang, K., and Deng, J.: “Stacked hourglass networks for human pose estimation. In: Proceedings of the European Conference on Computer Vision. Springer, pp. 483–499 (2016)
Singh, P., Verma, V.K., Rai, P., and Namboodiri, V.P.: Hetconv: heterogeneous kernel-based convolutions for deep cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4835–4844 (2019)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D.: Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)
Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., and Wu, J.: Feature-fused ssd: fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing (ICGIP 2017), vol. 10615. International Society for Optics and Photonics, p. 106151E (2018)
Sun, K., Xiao, B., Liu, D., and Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition,. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755 (2014)
Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Ronneberger, O., Fischer, P., and Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 234–241 (2015)
Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Luo, J.-H., Wu, J., and Lin, W.: Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)
He,Y., Zhang, X., and Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397 (2017)
Li, H., Kadav, A., Durdanovic, I., H.Samet, and Graf H.P.: Pruning filters for efficient convnets. In: International Conference on Learning Representations (ICLR), (2017)
He, Y., Kang, G., Dong, X., Fu, Y., and Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 2234–2240 (2018)
Tan, M., and Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Krizhevsky, A., Sutskever, I., and Hinton, G.E.:Imagenet classification with deep convolutional neural networks. In: Advances in Nneural Information Processing Systems, pp. 1097–1105 (2012)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., and Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and%3c 0.5 mb model size. arXiv preprint arXiv:1602.07360, (2016)
Huang, G., Liu, S., Vander Maaten, L., and Weinberger K.Q.: Condensenet: an efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, T., and Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, (2017)
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Li, S., Li, J., Lin, W., and Tang, H.: Amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586, (2019)
He, K., Girshick, R., and Dollár, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4918–4927 (2019)

Download references

Acknowledgements

This work was supported by the Major Project of Technological Innovation 2030 -”New Generation Artificial Intelligence” (2018AAA0100800), the National Natural Science Foundation of China (61872042, 61572077, 61972375), the Key Project of the Education Commission of Beijing Municipal (KZ201911417048), Premium Funding Project for Academic Human Resources Development in Beijing Union University(BPHR2020AZ01, BPH2020EZ01), and the Project of High-Level Teachers in Beijing Municipal Universities in the Period of the 13th Five-Year Plan (CIT & TCD 201704069).

Author information

Authors and Affiliations

Beijing Union University, Beijing, 100101, China
Xiaojie Yin, Ning He & Xiaoxiao Liu
University of Chinese Academic of Sciences, Beijing, 100049, China
Ke Lu

Authors

Xiaojie Yin
View author publications
You can also search for this author in PubMed Google Scholar
Ning He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ke Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning He.

Additional information

Communicated by B.-K. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, X., He, N., Liu, X. et al. SHetConv: target keypoint detection based on heterogeneous convolution neural networks. Multimedia Systems 27, 519–529 (2021). https://doi.org/10.1007/s00530-020-00729-7

Download citation

Received: 01 August 2020
Accepted: 03 December 2020
Published: 27 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00530-020-00729-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SHetConv: target keypoint detection based on heterogeneous convolution neural networks

Abstract

Access this article

Similar content being viewed by others

The Same Size Dilated Attention Network for Keypoint Detection

GoatPose: A Lightweight and Efficient Network with Attention Mechanism

Group Residual Dense Block for Key-Point Detector with One-Level Feature

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SHetConv: target keypoint detection based on heterogeneous convolution neural networks

Abstract

Access this article

Similar content being viewed by others

The Same Size Dilated Attention Network for Keypoint Detection

GoatPose: A Lightweight and Efficient Network with Attention Mechanism

Group Residual Dense Block for Key-Point Detector with One-Level Feature

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation