GET: group equivariant transformer for person detection of overhead fisheye images

Chen, Yongqing; Zhu, Dandan; Li, Nanyu; Zhou, You; Bai, Yong

doi:10.1007/s10489-023-04747-6

GET: group equivariant transformer for person detection of overhead fisheye images

Published: 26 July 2023

Volume 53, pages 24551–24565, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yongqing Chen¹,
Dandan Zhu²,
Nanyu Li³,
You Zhou¹ &
…
Yong Bai ORCID: orcid.org/0000-0002-2506-5981¹

177 Accesses
1 Citation
Explore all metrics

Abstract

Fisheye cameras has a large field of view, so it is widely used in scene monitoring, robot navigation, intelligent system, virtual reality panorama, augmented reality panorama and other fields, but person detection under the overhead fisheye camera is still a challenge due to its unique radial geometry and barrel distortion. Generic object detection algorithms do not work well for person detection on panoramic images of the fisheye camera. Recent approaches either use radially aligned bounding boxes to detect persons or improve anchor-based methods to obtain rotated bounding boxes. However, these methods require additional hyperparameters (e.g., anchor boxes) and have low generalization ability. To address this issue, we propose a novel model called Group Equivariant Transformer (GET) which uses the Transformer to directly regress the bounding boxes and rotation angles. GET not need any additional hyperparameters and have generalization ability. In our GET, we uses the Group Equivariant Convolutional Network (GECN) and Multi-Scale Encoder Module (MEM) to extract multi-scale rotated embedding features of overhead fisheye image for Transformer, then we propose an embedding optimization loss to improve the diversity of these features. Finally, we use a Decoder Module (DM) to decode the rotated bounding boxes’information from embedding features. Extensive experiments conducted on three benchmark fisheye camera datasets demonstrate that the proposed method achieves the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rotation-equivariant transformer for oriented person detection of overhead fisheye images

Article Open access 01 August 2023

OARPD: occlusion-aware rotated people detection in overhead fisheye images

Article 13 March 2024

Multi-view Person Re-identification in a Fisheye Camera Network with Different Viewing Directions

Article 28 October 2019

References

Challenging events for person detection from overhead fisheye images (2022). https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/
Human-aligned bounding boxes from overhead fisheye cameras dataset (2022). https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/
Mirror worlds challenge (2022). http://www2.icat.vt.edu/mirrorworlds/challenge/index.html
Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33
Article Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. European Conference on Computer Vision (ECCV) pp. 213–229
Chang Q Hung K W JJ (2018) Deep learning based image super-resolution for nonlinear lens distortion. In: Neurocomputing, pp. 969–982. Elsevier
Chiang AT, Wang Y (2014) Human detection in fish-eye images using hog-based detectors over rotated windows. 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) pp. 1–6
Cohen T, Welling M (2016) Group equivariant convolutional networks. International Conference on Machine Learning (ICML) pp. 2990–2999
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE/CVF Int Conf Comput Vision (CVPR) 1:886–893
Google Scholar
Deng L Chen Z CBea (2016) Incremental image set querying based localization. In: Neurocomputing, pp. 315–324. Elsevier
Ding J, Xue N, Long Y, Xia GS, Lu Q (2019) Learning roi transformer for oriented object detection in aerial images. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) pp. 2849–2858
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 36(8):1532–1545
Article Google Scholar
Duan Z, Tezcan O, Nakamura H, Ishwar P, Konrad J (2020) Rapid: rotation-aware people detection in overhead fisheye images. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) Workshops pp. 636–637
Enzweiler M, Gavrila DM (2008) Monocular pedestrian detection: Survey and experiments. IEEE Trans Pattern Anal Mach Intell (TPAMI) 31(12):2179–2195
Article Google Scholar
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Georgakopoulos S V Kottari K DKea (2018) Pose recognition using convolutional neural networks on omni-directional images. In: Neurocomputing, pp. 23–31. Elsevier
Girshick R (2015) Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV) pp. 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 580–587
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 2961–2969
Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv:1706.09579
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: A survey. arXiv:2101.01169
Krams O, Kiryati N (2017) People detection in top-view fisheye imaging. In: IEEE international conference on advanced video and signal based surveillance (AVSS), pp. 1–6. IEEE
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV) pp. 734–750
Li S, Tezcan MO, Ishwar P, Konrad J (2019) Supervised people counting using an overhead fisheye camera. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1–8. https://doi.org/10.1109/AVSS.2019.8909877
Li N Si Y DDea (2017) Ecg beats classification via online sparse dictionary and time pyramid matching. In: International Conference on Communication Technology (ICCT), pp. 1537–1543. IEEE
Li N Wang J LZea (2021) High-order-interaction for weakly supervised fine-grained visual categorization. In: Neurocomputing, pp. 27–36. Elsevier
Li N ZCC (2020) Ampa-net:optimization-inspired attention neural network for deep compressed sensing. In: International Conference on Communication Technology (ICCT), pp. 1338–1344. IEEE
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV), pp. 740–755. Springer
Liu L, Pan Z, Lei B (2017) Learning a rotation invariant detector with rotatable bounding box. arXiv:1711.09405
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. European Conference on Computer Vision (ECCV) pp. 21–37
Liu Z, Hu J, Weng L, Yang Y (2017) Rotated region based cnn for ship detection. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 900–904. IEEE
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: A survey. Elsevier Pattern Recog 51:148–175
Article Google Scholar
Noh J, Lee S, Kim B, Kim G (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 966–974
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) pp. 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99
Google Scholar
Saito M, Kitaguchi K, Kimura G, Hashimoto M (2011) People detection and tracking from fish-eye image based on probabilistic appearance model. In: SICE Annual Conference 2011, pp. 435–440. IEEE
Tamura M, Horiguchi S, Murakami T (2019) Omnidirectional pedestrian detection by rotation invariant training. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 1989–1998
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 10781–10790
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 9627–9636
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen LC (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European Conference on Computer Vision(ECCV), pp. 108–126. Springer
Wang T, Chang CW, Wu YS (2107) Template-based people detection using a single downward-viewing fisheye camera. In: International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 719–723. IEEE
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision (ECCV), pp. 443–457. Springer
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 3213–3221
Zhang S, Benenson R, Schiele B, et al (2015) Filtered channel features for pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), vol. 1, p. 4
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 9759–9768
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 6995–7003
Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 16259–16268
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. Proc Conf Artif Intell (AAAI) 33:9259–9266
Google Scholar
Zhao K, Liao K LCea (2021) Joint distortion rectification and super-resolution for self-driving scene perception. In: Neurocomputing, pp. 176–185. Elsevier
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Zhou Y, Ye Q, Qiu Q, Jiao J (2017) Oriented response networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 519–528
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159

Download references

Acknowledgements

This work was supported in part by the Hainan Provincial Natural Science Foundation of China under Grant 620RC556, the National Natural Science Foundation of China under Grant 61961014 and the National Natural Science Foundation of China under Grant 62001289.

Author information

Authors and Affiliations

School of Information and Communication Engineer, Hainan University, Haikou, 570228, China
Yongqing Chen, You Zhou & Yong Bai
Key Laboratory of Artificial Intelligence Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200240, China
Dandan Zhu
ShantouSanjiang Painting and Calligraphy Academy, Shenzhen, 518000, China
Nanyu Li

Authors

Yongqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dandan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Nanyu Li
View author publications
You can also search for this author in PubMed Google Scholar
You Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yong Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Bai.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Y., Zhu, D., Li, N. et al. GET: group equivariant transformer for person detection of overhead fisheye images. Appl Intell 53, 24551–24565 (2023). https://doi.org/10.1007/s10489-023-04747-6

Download citation

Accepted: 29 May 2023
Published: 26 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04747-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GET: group equivariant transformer for person detection of overhead fisheye images

Abstract

Access this article

Similar content being viewed by others

Rotation-equivariant transformer for oriented person detection of overhead fisheye images

OARPD: occlusion-aware rotated people detection in overhead fisheye images

Multi-view Person Re-identification in a Fisheye Camera Network with Different Viewing Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GET: group equivariant transformer for person detection of overhead fisheye images

Abstract

Access this article

Similar content being viewed by others

Rotation-equivariant transformer for oriented person detection of overhead fisheye images

OARPD: occlusion-aware rotated people detection in overhead fisheye images

Multi-view Person Re-identification in a Fisheye Camera Network with Different Viewing Directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation