Abstract
Panoramic images have become increasingly popular as omnidirectional panoramic technology has advanced. Many datasets and works resort to object detection to better understand the content of the panoramic image. These datasets and detectors use a Bounding Field of View (BFoV) as a bounding box in panoramic images. However, we observe that the object instances in panoramic images often appear with arbitrary orientations. It indicates that BFoV as a bounding box is inappropriate, limiting the performance of detectors. This paper proposes a new bounding box representation, Rotated Bounding Field of View (RBFoV), for the panoramic image object detection task. Then, based on the RBFoV, we present a PANoramic Detection dataset for Object with oRientAtion (PANDORA). Finally, based on PANDORA, we evaluate the current state-of-the-art panoramic image object detection methods and design an anchor-free object detector called R-CenterNet for panoramic images. Compared with these baselines, our R-CenterNet shows its advantages in terms of detection performance. Our PANDORA dataset and source code are available at https://github.com/tdsuper/SphericalObjectDetection.
H. Xu and Q. Zhao—This work was done when Hang Xu and Qiang Zhao were at ICT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV (2018)
Anguelov, D., et al.: Google street view: capturing the world at street level. Computer 43(6), 32–38 (2010)
Chou, S.H., Sun, C., Chang, W.Y., Hsu, W.T., Sun, M., Fu, J.: 360-indoor: towards learning real-world objects in 360deg indoor equirectangular images. In: WACV (2020)
Cormack, R.: Flattening the earth: two thousand years of map projections by John P. Snyder; two by two: twenty-two pairs of maps from the newberry library illustrating five hundred years of western cartographic history by James Akerman; Robert Karrow; David Buisseret. ISIS 85(3), 488–489 (1994)
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Everingham, M., Eslami, S.M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: CVPR (2017)
Huang, J., Chen, Z., Research, A., Ceylan, U.D., Hailin, U.: 6-DOF VR videos with a single 360-camera. In: VR (2017)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.J.: SpherePHD: applying CNNs on a spherical PolyHeDron representation of 360 images (2019)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lin, T.: Labelimg (2015)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Neubeck, A., Gool, L.: Efficient non-maximum suppression. In: ICPR (2006)
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Pearson, F.: Map Projections: Theory and Applications. CRC Press, Boca Raton (1990)
Putri, S.E., Tulus, T., Napitupulu, N.: Implementation and analysis of depth-first search (DFS) algorithm for finding the longest path. In: InteriOR (2011)
Ran, L., Zhang, Y., Zhang, Q., Tao, Y.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Su, Y.C., Grauman, K.: Learning spherical convolution for fast features from 360 imagery. In: CVPR (2017)
Su, Y., Jayaraman, D., Grauman, K.: Pano2vid: automatic cinematography for watching \(360^{\circ }\) videos. In: ACCV (2016)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2020)
Wang, K.H., Lai, S.H.: Object detection in curved space for 360-degree camera. In: ICASSP (2019)
Lai, W.-S., Huang, Y., Joshi, N., Buehler, C., Yang, M.-H.: Semantic-driven generation of hyperlapse from 360[formula: see text] video. TVCG 24(9), 2610–2621 (2017)
Wikipedia contributors: Spherical trigonometry (2021). https://en.wikipedia.org/w/index.php?title=Spherical_trigonometry &oldid=1016967508
Yang, W., Qian, Y., Cricri, F., Fan, L., Kamarainen, J.K.: Object detection in equirectangular panorama (2018)
Yang, X., Yan, J., Qi, M., Wang, W., Xiaopeng, Z., Qi, T.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning (2021)
Yu, D., Ji, S.: Grid based spherical CNN for object detection from panoramic images. Sensors 19(11), 2622 (2019)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR (2020)
Zhao, P., You, A., Zhang, Y., Liu, J., Tong, Y.: Spherical criteria for fast and accurate \(360^{\circ }\) object detection. In: AAAI, vol. 34, pp. 12959–12966 (2020)
Zhao, Q., Zhu, C., Dai, F., Ma, Y., Zhang, Y.: Distortion-aware CNNs for spherical images. In: Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI 2018 (2018)
Zhao, Q., et al.: Unbiased IOU for spherical image object detection. In: AAAI (2022)
Zhao, Q., Feng, W., Wan, L., Zhang, J.: Sphorb: a fast and robust binary feature on the sphere. Int. J. Comput. Vision 113(2), 143–159 (2015)
Zhao, Q., Wan, L., Feng, W., Zhang, J., Wong, T.T.: Cube2video: navigate between cubic panoramas in real-time. IEEE Trans. Multimedia 15(8), 1745–1754 (2013)
Zheng, J., et al.: Gait recognition in the wild with multi-hop temporal switch. In: ACM MM (2022)
Zheng, J., Liu, X., Liu, W., He, L., Yan, C., Mei, T.: Gait recognition in the wild with dense 3D representations and a benchmark. In: CVPR, pp. 20228–20237 (2022)
Zhou, X., Wang, D., Krhenbühl, P.: Objects as points. arXiv (2019)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (2020YFB1406604) and the National Natural Science Foundation of China (62072438, U1936110, 61931008, U21B2024).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, H. et al. (2022). PANDORA: A Panoramic Detection Dataset for Object with Orientation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-20074-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)