PointDet++: an object detection framework based on human local features with transformer encoder

Tang, Yudi; Wang, Bing; He, Wangli; Qian, Feng

doi:10.1007/s00521-022-06938-7

PointDet++: an object detection framework based on human local features with transformer encoder

S.I.: Interpretation of Deep Learning
Published: 20 February 2022

Volume 35, pages 10097–10108, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yudi Tang¹,
Bing Wang¹,
Wangli He¹ &
…
Feng Qian¹

1016 Accesses
1 Altmetric
Explore all metrics

Abstract

Object detection algorithm plays an important role in the field of chemical plant safety identification. In the field operation chemical plant scene, usually the targets to be detected are highly correlated with people, and most of them are small objects because of the long shooting distance. Conventional object detection algorithms use strong backbone module to obtain global features, which makes the algorithm module perform well in large targets. But these methods are difficult to be applied in small target detection scenes as they lack the use of local features. How to make better use of local features is the key to small target detection tasks. It is not only necessary to extract the local features from the original data, but also to consider the location relationship between them. To solve this problem, we propose a new object detection framework named PointDet++. The first step is to use the trained pose estimation model to obtain the local features of human body. Then, we reconstruct local features and global features, respectively, with transformer encoder and graph convolution. In the output layer, we integrate local features and global features according to the target to be detected, so as to improve the detection performance of our proposed model. Specifically, our framework significantly outperforms state of the art by 10.3 AP scores on field operation dataset in chemical plant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task learning and joint refinement between camera localization and object detection

Article Open access 08 February 2024

An Improved Object Detection Algorithm Based on CenterNet

Lightweight object detection network model suitable for indoor mobile robots

Article 02 February 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

This work is supported by National Key Research and Development Program of China under Grant 2018AAA0101602, National Natural Science Foundation of China (61922030).

References

Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Tang Y, Wang B, He W, Qian F (2021) Pointdet: an object detection framework based on human local features in the task of identifying violations. In: 2021 11th international conference on information science and technology (ICIST). IEEE, pp 673–680
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. arXiv preprint arXiv:2103.00112
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
(Format ) Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang R, Pedoeem J, Chen C (2018) Yolo-lite: a real-time object detection algorithm optimized for non-gpu computers. In: 2018 I33EEE international conference on big data (Big Data). IEEE, pp 2503–2510
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no 07, pp 11839–11847
Wang G, Yang S, Liu H, Wang Z, Yang Y, Wang S, Yu G, Zhou E, Sun J (2020) High-order information matters: learning relation and topology for occluded person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: International conference on machine learning. PMLR, pp 1691–1703
Nagrani A, Sun C, Ross D, Sukthankar R, Schmid C, Zisserman A (2020) Speech2action: cross-modal supervision for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10317–10326
Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10941–10950
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Wang C, Samari B, Siddiqi K (2018) Local spectral graph convolution for point set feature learning. In: Proceedings of the European conference on computer vision (ECCV), pp 52–66
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: a survey and outlook. In: IEEE transactions on pattern analysis and machine intelligence
Driscoll SWO, Giori NJ (2000) Continuous passive motion (CPM): theory and principles of clinical application. J Rehabil Res Dev 37(2):179–188
Google Scholar
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186
Article Google Scholar
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al. (2020) A survey on visual transformer. arXiv preprint arXiv:2012.12556
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Yun S, Han D, Oh SJ et al (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032

Download references

Author information

Authors and Affiliations

Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, No.130 Meilong Road, Shanghai, 200237, China
Yudi Tang, Bing Wang, Wangli He & Feng Qian

Authors

Yudi Tang
View author publications
You can also search for this author inPubMed Google Scholar
Bing Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wangli He
View author publications
You can also search for this author inPubMed Google Scholar
Feng Qian
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yudi Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Y., Wang, B., He, W. et al. PointDet++: an object detection framework based on human local features with transformer encoder. Neural Comput & Applic 35, 10097–10108 (2023). https://doi.org/10.1007/s00521-022-06938-7

Download citation

Received: 26 October 2021
Accepted: 04 January 2022
Published: 20 February 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-022-06938-7

Keywords

Part of a collection:

S.I.: Interpretation of Deep Learning: Prediction, Representation, Modeling and Utilization (vol 35, issue 14)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PointDet++: an object detection framework based on human local features with transformer encoder

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-task learning and joint refinement between camera localization and object detection

An Improved Object Detection Algorithm Based on CenterNet

Lightweight object detection network model suitable for indoor mobile robots

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now