Real-Time Detection of Multiple Targets from a Moving 360 $$^{\circ }$$ Panoramic Imager in the Wild

Yuan, Boyan; Belbachir, Nabil

doi:10.1007/978-3-030-68238-5_8

Boyan Yuan¹⁰ &
Nabil Belbachir¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Included in the following conference series:

European Conference on Computer Vision

1989 Accesses

The original version of this chapter was revised: An acknowledgement text has been added. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-68238-5_49

Abstract

Our goal is to develop embedded and mobile vision applications leveraging state-of-the-art visual sensors and efficient neural network architectures deployed on emerging neural computing engines for smart monitoring and inspection purposes. In this paper, we present 360$^{\circ }$ vision system onboard an automobile or UAV platform for large field-of-view and real-time detection of multiple challenging objects. The targeted objects include flag as a deformable object; UAV as a tiny, flying object which changes its scales and positions rapidly; and grouped objects containing piled sandbags as deformable objects in a group themselves, flag and stop sign to form a scene representing an artificial fake checkpoint. Barrel distortions owing to the 360$^{\circ }$ optics make the detection task even more challenging. A light-weight neural network model based on MobileNets architecture is transfer learned for detection of the custom objects with very limited training data. In method 1, we generated a dataset of perspective planar images via a virtual camera model which projects a patch on the hemisphere to a 2D plane. In method 2, the panomorph images are directly used without projection. Real-time detection of the objects in 360$^{\circ }$ video is realized by feeding live streamed frames captured by the full hemispheric (180$^{\circ }$ $\times $ 360$^{\circ }$) field-of-view ImmerVision Enables panomorph lens to the trained MobileNets model. We found that with only few training data which is far less than 10 times of Vapnik–Chervonenkis dimension of the model, the MobileNets model achieves a detection rate of 80–90% for test data having a similar distribution as the training data. However, the model performance dropped drastically when it was put in action in the wild for unknown data in which both weather and lighting conditions were different. The generalization capability of the model can be improved by training with more data. The contribution of this work is a 360$^{\circ }$ vision hardware and software system for real-time detection of challenging objects. This system could be configured for very low-power embedded applications by running inferences via a neural computing engine such as Intel Movidius NSC2 or HiSilicon Kirin 970.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

11 May 2021
In the originally published version of chapter 8 an acknowledgement was missing. This has been corrected.

References

Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_8
Chapter MATH Google Scholar
Thibault, S.: Enhanced surveillance system based on panomorph panoramic lenses. In: Optics and Photonics in Global Homeland Security III, vol. 6540, p. 65400E. International Society for Optics and Photonics (2007)
Google Scholar
Thibault, S., Konen, P., Roulet, P., Villegas, M.: Novel hemispheric image formation: concepts and applications. In: Photon Management III, vol. 6994, p. 699406. International Society for Optics and Photonics (2008)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Article Google Scholar
Li, C., Parikh, D., Chen, T.: Automatic discovery of groups of objects for scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2735–2742. IEEE (2012)
Google Scholar
Moo Yi, K., Yun, K., Wan Kim, S., Jin Chang, H., Young Choi, J.: Detection of moving objects with non-stationary cameras in 5.8 ms: bringing motion detection to your mobile device. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 27–34 (2013)
Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Article Google Scholar
Caruso, D., Engel, J., Cremers, D.: Large-scale direct slam for omnidirectional cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 141–148. IEEE (2015)
Google Scholar
Furnari, A., Farinella, G.M., Bruna, A.R., Battiato, S.: Affine covariant features for fisheye distortion local modeling. IEEE Trans. Image Process. 26(2), 696–710 (2016)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Nguyen, T.B., Chung, S.T., et al.: ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 840–845. IEEE (2016)
Google Scholar
Rozantsev, A., Lepetit, V., Fua, P.: Detecting flying objects using a single moving camera. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 879–892 (2016)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801–808. IEEE (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hu, H.-N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360deg sports videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3451–3460 (2017)
Google Scholar
Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)
Google Scholar
Ran, L., Zhang, Y., Zhang, Q., Yang, T.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)
Article Google Scholar
Su, Y.-C., Grauman, K.: Learning spherical convolution for fast features from 360 imagery. In: Advances in Neural Information Processing Systems, pp. 529–539 (2017)
Google Scholar
Baek, I., Davies, A., Yan, G., Rajkumar, R.R.: Real-time detection, tracking, and classification of moving and stationary objects using multiple fisheye images. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 447–452. IEEE (2018)
Google Scholar
Chung, Y., Haas, P.J., Upfal, E., Kraska, T.: Unknown examples & machine learning model generalization. arXiv preprint arXiv:1808.08294 (2018)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. arXiv preprint arXiv:1801.10130 (2018)
Coors, B., Paul Condurache, A., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)
Google Scholar
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical CNNs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–68 (2018)
Google Scholar
Georgakopoulos, S.V., Kottari, K., Delibasis, K., Plagianakos, V.P., Maglogiannis, I.: Pose recognition using convolutional neural networks on omni-directional images. Neurocomputing 280, 23–31 (2018)
Article Google Scholar
Kolter, Z., Madry, A.: Adversarial robustness: theory and practice. Tutorial at NeurIPS (2018)
Google Scholar
Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. arXiv preprint arXiv:1802.08760 (2018)
Roulet, P., et al.: Method to capture, store, distribute, share, stream and display panoramic image or video, US Patent App. 15/656,707, 24 May 2018
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for MobileNets. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 14–18. IEEE (2018)
Google Scholar
Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
Google Scholar
Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)
Google Scholar
Hossain, S., Lee, D.j.: Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices. Sensors 19(15), 3371 (2019)
Google Scholar
Hou, Y.C., Sahari, K.S.M., How, D.N.T.: A review on modeling of flexible deformable object for dexterous robotic manipulation. Int. J. Adv. Rob. Syst. 16(3), 1729881419848894 (2019)
Google Scholar
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Ignatov, A., et al.: AI benchmark: all about deep learning on smartphones in 2019. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3617–3635. IEEE (2019)
Google Scholar
Perraudin, N., Defferrard, M., Kacprzak, T., Sgier, R.: DeepSphere: efficient spherical convolutional neural network with HEALPix sampling for cosmological applications. Astron. Comput. 27, 130–146 (2019)
Article Google Scholar
Reuther, A., Michaleas, P., Jones, M., Gadepally, V., Samsi, S., Kepner, J.: Survey and benchmarking of machine learning accelerators. arXiv preprint arXiv:1908.11348 (2019)
Wang, K.-H., Lai, S.-H.: Object detection in curved space for 360-degree camera. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3642–3646. IEEE (2019)
Google Scholar
Wang, T., Hsieh, Y.Y., Wong, F.W., Chen, Y.F.: Mask-RCNN based people detection using a top-view fisheye camera. In: 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 1–4. IEEE (2019)
Google Scholar
Yu, D., Ji, S.: Grid based spherical CNN for object detection from panoramic images. Sensors 19(11), 2622 (2019)
Article Google Scholar
Zhu, J., Zhu, J., Wan, X., Wu, C., Xu, C.: Object detection and localization in 3D environment by fusing raw fisheye image and attitude data. J. Vis. Commun. Image Represent. 59, 128–139 (2019)
Article Google Scholar
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039 (2020)
Google Scholar
Chou, S.-H., Sun, C., Chang, W.Y., Hsu, W.T., Sun, M., Fu, J.: 360-indoor: towards learning real-world objects in 360deg indoor equirectangular images. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 845–853 (2020)
Google Scholar
Farahi, F., Yazdi, H.S.: Probabilistic Kalman filter for moving object tracking. Sig. Process. Image Commun. 82, 115751 (2020)
Article Google Scholar
Gkitsas, V., Zioulis, N., Alvarez, F., Zarpalas, D., Daras, P.: Deep lighting environment map estimation from spherical panoramas. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 640–641 (2020)
Google Scholar
Jin, L., et al.: Geometric structure based and regularized depth estimation from 360 indoor imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 889–898 (2020)
Google Scholar
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.-J.: SpherePHD: applying CNNs on 360$^{\circ }$ images with non-Euclidean spherical PolyHeDron representation. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2020). https://doi.org/10.1109/TPAMI.2020.2997045
Yan, X., Acuna, D., Fidler, S.: Neural data server: a large-scale search engine for transfer learning data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3893–3902 (2020)
Google Scholar
Yang, Q., Li, C., Dai, W., Zou, J., Qi, G.J., Xiong, H.: Rotation equivariant graph convolutional network for spherical image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4303–4312 (2020)
Google Scholar
Zhao, P., You, A., Zhang, Y., Liu, J., Bian, K., Tong, Y.: Spherical criteria for fast and accurate 360 object detection. In: AAAI, pp. 12959–12966 (2020)
Google Scholar

Download references

Acknowledgement

This work is supported by the project Spacetime Vision – Towards Unsupervised Learning in the 4D World funded under the EEA grant number EEA-RO-NO-2018-0496.

Author information

Authors and Affiliations

NORCE Norwegian Research Centre AS, Jon Lilletuns vei 9 H, 4879, Grimstad, Norway
Boyan Yuan & Nabil Belbachir

Authors

Boyan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Nabil Belbachir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boyan Yuan .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, B., Belbachir, N. (2020). Real-Time Detection of Multiple Targets from a Moving 360$^{\circ }$ Panoramic Imager in the Wild. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-68238-5_8
Published: 31 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68237-8
Online ISBN: 978-3-030-68238-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Real-Time Detection of Multiple Targets from a Moving 360\(^{\circ }\) Panoramic Imager in the Wild

Abstract

Access this chapter

Change history

11 May 2021

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Real-Time Detection of Multiple Targets from a Moving 360\(^{\circ }\) Panoramic Imager in the Wild

Abstract

Access this chapter

Change history

11 May 2021

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation