Skip to main content
Log in

GET: group equivariant transformer for person detection of overhead fisheye images

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fisheye cameras has a large field of view, so it is widely used in scene monitoring, robot navigation, intelligent system, virtual reality panorama, augmented reality panorama and other fields, but person detection under the overhead fisheye camera is still a challenge due to its unique radial geometry and barrel distortion. Generic object detection algorithms do not work well for person detection on panoramic images of the fisheye camera. Recent approaches either use radially aligned bounding boxes to detect persons or improve anchor-based methods to obtain rotated bounding boxes. However, these methods require additional hyperparameters (e.g., anchor boxes) and have low generalization ability. To address this issue, we propose a novel model called Group Equivariant Transformer (GET) which uses the Transformer to directly regress the bounding boxes and rotation angles. GET not need any additional hyperparameters and have generalization ability. In our GET, we uses the Group Equivariant Convolutional Network (GECN) and Multi-Scale Encoder Module (MEM) to extract multi-scale rotated embedding features of overhead fisheye image for Transformer, then we propose an embedding optimization loss to improve the diversity of these features. Finally, we use a Decoder Module (DM) to decode the rotated bounding boxes’information from embedding features. Extensive experiments conducted on three benchmark fisheye camera datasets demonstrate that the proposed method achieves the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Challenging events for person detection from overhead fisheye images (2022). https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/

  2. Human-aligned bounding boxes from overhead fisheye cameras dataset (2022). https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/

  3. Mirror worlds challenge (2022). http://www2.icat.vt.edu/mirrorworlds/challenge/index.html

  4. Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33

    Article  Google Scholar 

  5. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. European Conference on Computer Vision (ECCV) pp. 213–229

  6. Chang Q Hung K W JJ (2018) Deep learning based image super-resolution for nonlinear lens distortion. In: Neurocomputing, pp. 969–982. Elsevier

  7. Chiang AT, Wang Y (2014) Human detection in fish-eye images using hog-based detectors over rotated windows. 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) pp. 1–6

  8. Cohen T, Welling M (2016) Group equivariant convolutional networks. International Conference on Machine Learning (ICML) pp. 2990–2999

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE/CVF Int Conf Comput Vision (CVPR) 1:886–893

    Google Scholar 

  10. Deng L Chen Z CBea (2016) Incremental image set querying based localization. In: Neurocomputing, pp. 315–324. Elsevier

  11. Ding J, Xue N, Long Y, Xia GS, Lu Q (2019) Learning roi transformer for oriented object detection in aerial images. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) pp. 2849–2858

  12. Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 36(8):1532–1545

    Article  Google Scholar 

  13. Duan Z, Tezcan O, Nakamura H, Ishwar P, Konrad J (2020) Rapid: rotation-aware people detection in overhead fisheye images. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) Workshops pp. 636–637

  14. Enzweiler M, Gavrila DM (2008) Monocular pedestrian detection: Survey and experiments. IEEE Trans Pattern Anal Mach Intell (TPAMI) 31(12):2179–2195

    Article  Google Scholar 

  15. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  16. Georgakopoulos S V Kottari K DKea (2018) Pose recognition using convolutional neural networks on omni-directional images. In: Neurocomputing, pp. 23–31. Elsevier

  17. Girshick R (2015) Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV) pp. 1440–1448

  18. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 580–587

  19. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 2961–2969

  20. Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv:1706.09579

  21. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: A survey. arXiv:2101.01169

  22. Krams O, Kiryati N (2017) People detection in top-view fisheye imaging. In: IEEE international conference on advanced video and signal based surveillance (AVSS), pp. 1–6. IEEE

  23. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV) pp. 734–750

  24. Li S, Tezcan MO, Ishwar P, Konrad J (2019) Supervised people counting using an overhead fisheye camera. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1–8. https://doi.org/10.1109/AVSS.2019.8909877

  25. Li N Si Y DDea (2017) Ecg beats classification via online sparse dictionary and time pyramid matching. In: International Conference on Communication Technology (ICCT), pp. 1537–1543. IEEE

  26. Li N Wang J LZea (2021) High-order-interaction for weakly supervised fine-grained visual categorization. In: Neurocomputing, pp. 27–36. Elsevier

  27. Li N ZCC (2020) Ampa-net:optimization-inspired attention neural network for deep compressed sensing. In: International Conference on Communication Technology (ICCT), pp. 1338–1344. IEEE

  28. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV), pp. 740–755. Springer

  29. Liu L, Pan Z, Lei B (2017) Learning a rotation invariant detector with rotatable bounding box. arXiv:1711.09405

  30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. European Conference on Computer Vision (ECCV) pp. 21–37

  31. Liu Z, Hu J, Weng L, Yang Y (2017) Rotated region based cnn for ship detection. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 900–904. IEEE

  32. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  33. Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: A survey. Elsevier Pattern Recog 51:148–175

    Article  Google Scholar 

  34. Noh J, Lee S, Kim B, Kim G (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 966–974

  35. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR) pp. 779–788

  36. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 7263–7271

  37. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  38. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99

    Google Scholar 

  39. Saito M, Kitaguchi K, Kimura G, Hashimoto M (2011) People detection and tracking from fish-eye image based on probabilistic appearance model. In: SICE Annual Conference 2011, pp. 435–440. IEEE

  40. Tamura M, Horiguchi S, Murakami T (2019) Omnidirectional pedestrian detection by rotation invariant training. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 1989–1998

  41. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 10781–10790

  42. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 9627–9636

  43. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5998–6008

  44. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen LC (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European Conference on Computer Vision(ECCV), pp. 108–126. Springer

  45. Wang T, Chang CW, Wu YS (2107) Template-based people detection using a single downward-viewing fisheye camera. In: International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 719–723. IEEE

  46. Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision (ECCV), pp. 443–457. Springer

  47. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 3213–3221

  48. Zhang S, Benenson R, Schiele B, et al (2015) Filtered channel features for pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), vol. 1, p. 4

  49. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 9759–9768

  50. Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 6995–7003

  51. Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 16259–16268

  52. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. Proc Conf Artif Intell (AAAI) 33:9259–9266

    Google Scholar 

  53. Zhao K, Liao K LCea (2021) Joint distortion rectification and super-resolution for self-driving scene perception. In: Neurocomputing, pp. 176–185. Elsevier

  54. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

  55. Zhou Y, Ye Q, Qiu Q, Jiao J (2017) Oriented response networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 519–528

  56. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. arXiv:2010.04159

Download references

Acknowledgements

This work was supported in part by the Hainan Provincial Natural Science Foundation of China under Grant 620RC556, the National Natural Science Foundation of China under Grant 61961014 and the National Natural Science Foundation of China under Grant 62001289.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Bai.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Zhu, D., Li, N. et al. GET: group equivariant transformer for person detection of overhead fisheye images. Appl Intell 53, 24551–24565 (2023). https://doi.org/10.1007/s10489-023-04747-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04747-6

Keywords

Navigation