Skip to main content

Real-Time Detection of Multiple Targets from a Moving 360\(^{\circ }\) Panoramic Imager in the Wild

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Included in the following conference series:

  • 1989 Accesses

Abstract

Our goal is to develop embedded and mobile vision applications leveraging state-of-the-art visual sensors and efficient neural network architectures deployed on emerging neural computing engines for smart monitoring and inspection purposes. In this paper, we present 360\(^{\circ }\) vision system onboard an automobile or UAV platform for large field-of-view and real-time detection of multiple challenging objects. The targeted objects include flag as a deformable object; UAV as a tiny, flying object which changes its scales and positions rapidly; and grouped objects containing piled sandbags as deformable objects in a group themselves, flag and stop sign to form a scene representing an artificial fake checkpoint. Barrel distortions owing to the 360\(^{\circ }\) optics make the detection task even more challenging. A light-weight neural network model based on MobileNets architecture is transfer learned for detection of the custom objects with very limited training data. In method 1, we generated a dataset of perspective planar images via a virtual camera model which projects a patch on the hemisphere to a 2D plane. In method 2, the panomorph images are directly used without projection. Real-time detection of the objects in 360\(^{\circ }\) video is realized by feeding live streamed frames captured by the full hemispheric (180\(^{\circ }\) \(\times \) 360\(^{\circ }\)) field-of-view ImmerVision Enables panomorph lens to the trained MobileNets model. We found that with only few training data which is far less than 10 times of Vapnik–Chervonenkis dimension of the model, the MobileNets model achieves a detection rate of 80–90% for test data having a similar distribution as the training data. However, the model performance dropped drastically when it was put in action in the wild for unknown data in which both weather and lighting conditions were different. The generalization capability of the model can be improved by training with more data. The contribution of this work is a 360\(^{\circ }\) vision hardware and software system for real-time detection of challenging objects. This system could be configured for very low-power embedded applications by running inferences via a neural computing engine such as Intel Movidius NSC2 or HiSilicon Kirin 970.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 11 May 2021

    In the originally published version of chapter 8 an acknowledgement was missing. This has been corrected.

References

  1. Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_8

    Chapter  MATH  Google Scholar 

  2. Thibault, S.: Enhanced surveillance system based on panomorph panoramic lenses. In: Optics and Photonics in Global Homeland Security III, vol. 6540, p. 65400E. International Society for Optics and Photonics (2007)

    Google Scholar 

  3. Thibault, S., Konen, P., Roulet, P., Villegas, M.: Novel hemispheric image formation: concepts and applications. In: Photon Management III, vol. 6994, p. 699406. International Society for Optics and Photonics (2008)

    Google Scholar 

  4. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)

    Article  Google Scholar 

  5. Li, C., Parikh, D., Chen, T.: Automatic discovery of groups of objects for scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2735–2742. IEEE (2012)

    Google Scholar 

  6. Moo Yi, K., Yun, K., Wan Kim, S., Jin Chang, H., Young Choi, J.: Detection of moving objects with non-stationary cameras in 5.8 ms: bringing motion detection to your mobile device. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 27–34 (2013)

    Google Scholar 

  7. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)

    Article  Google Scholar 

  8. Caruso, D., Engel, J., Cremers, D.: Large-scale direct slam for omnidirectional cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 141–148. IEEE (2015)

    Google Scholar 

  9. Furnari, A., Farinella, G.M., Bruna, A.R., Battiato, S.: Affine covariant features for fisheye distortion local modeling. IEEE Trans. Image Process. 26(2), 696–710 (2016)

    Article  MathSciNet  Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Nguyen, T.B., Chung, S.T., et al.: ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 840–845. IEEE (2016)

    Google Scholar 

  12. Rozantsev, A., Lepetit, V., Fua, P.: Detecting flying objects using a single moving camera. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 879–892 (2016)

    Article  Google Scholar 

  13. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  14. Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801–808. IEEE (2016)

    Google Scholar 

  15. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  16. Hu, H.-N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360deg sports videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3451–3460 (2017)

    Google Scholar 

  17. Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)

  18. Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)

    Google Scholar 

  19. Ran, L., Zhang, Y., Zhang, Q., Yang, T.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)

    Article  Google Scholar 

  20. Su, Y.-C., Grauman, K.: Learning spherical convolution for fast features from 360 imagery. In: Advances in Neural Information Processing Systems, pp. 529–539 (2017)

    Google Scholar 

  21. Baek, I., Davies, A., Yan, G., Rajkumar, R.R.: Real-time detection, tracking, and classification of moving and stationary objects using multiple fisheye images. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 447–452. IEEE (2018)

    Google Scholar 

  22. Chung, Y., Haas, P.J., Upfal, E., Kraska, T.: Unknown examples & machine learning model generalization. arXiv preprint arXiv:1808.08294 (2018)

  23. Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. arXiv preprint arXiv:1801.10130 (2018)

  24. Coors, B., Paul Condurache, A., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)

    Google Scholar 

  25. Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical CNNs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–68 (2018)

    Google Scholar 

  26. Georgakopoulos, S.V., Kottari, K., Delibasis, K., Plagianakos, V.P., Maglogiannis, I.: Pose recognition using convolutional neural networks on omni-directional images. Neurocomputing 280, 23–31 (2018)

    Article  Google Scholar 

  27. Kolter, Z., Madry, A.: Adversarial robustness: theory and practice. Tutorial at NeurIPS (2018)

    Google Scholar 

  28. Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. arXiv preprint arXiv:1802.08760 (2018)

  29. Roulet, P., et al.: Method to capture, store, distribute, share, stream and display panoramic image or video, US Patent App. 15/656,707, 24 May 2018

    Google Scholar 

  30. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  31. Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for MobileNets. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 14–18. IEEE (2018)

    Google Scholar 

  32. Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)

    Google Scholar 

  33. Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)

    Google Scholar 

  34. Hossain, S., Lee, D.j.: Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices. Sensors 19(15), 3371 (2019)

    Google Scholar 

  35. Hou, Y.C., Sahari, K.S.M., How, D.N.T.: A review on modeling of flexible deformable object for dexterous robotic manipulation. Int. J. Adv. Rob. Syst. 16(3), 1729881419848894 (2019)

    Google Scholar 

  36. Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)

    Google Scholar 

  37. Ignatov, A., et al.: AI benchmark: all about deep learning on smartphones in 2019. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3617–3635. IEEE (2019)

    Google Scholar 

  38. Perraudin, N., Defferrard, M., Kacprzak, T., Sgier, R.: DeepSphere: efficient spherical convolutional neural network with HEALPix sampling for cosmological applications. Astron. Comput. 27, 130–146 (2019)

    Article  Google Scholar 

  39. Reuther, A., Michaleas, P., Jones, M., Gadepally, V., Samsi, S., Kepner, J.: Survey and benchmarking of machine learning accelerators. arXiv preprint arXiv:1908.11348 (2019)

  40. Wang, K.-H., Lai, S.-H.: Object detection in curved space for 360-degree camera. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3642–3646. IEEE (2019)

    Google Scholar 

  41. Wang, T., Hsieh, Y.Y., Wong, F.W., Chen, Y.F.: Mask-RCNN based people detection using a top-view fisheye camera. In: 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 1–4. IEEE (2019)

    Google Scholar 

  42. Yu, D., Ji, S.: Grid based spherical CNN for object detection from panoramic images. Sensors 19(11), 2622 (2019)

    Article  Google Scholar 

  43. Zhu, J., Zhu, J., Wan, X., Wu, C., Xu, C.: Object detection and localization in 3D environment by fusing raw fisheye image and attitude data. J. Vis. Commun. Image Represent. 59, 128–139 (2019)

    Article  Google Scholar 

  44. Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039 (2020)

    Google Scholar 

  45. Chou, S.-H., Sun, C., Chang, W.Y., Hsu, W.T., Sun, M., Fu, J.: 360-indoor: towards learning real-world objects in 360deg indoor equirectangular images. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 845–853 (2020)

    Google Scholar 

  46. Farahi, F., Yazdi, H.S.: Probabilistic Kalman filter for moving object tracking. Sig. Process. Image Commun. 82, 115751 (2020)

    Article  Google Scholar 

  47. Gkitsas, V., Zioulis, N., Alvarez, F., Zarpalas, D., Daras, P.: Deep lighting environment map estimation from spherical panoramas. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 640–641 (2020)

    Google Scholar 

  48. Jin, L., et al.: Geometric structure based and regularized depth estimation from 360 indoor imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 889–898 (2020)

    Google Scholar 

  49. Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.-J.: SpherePHD: applying CNNs on 360\(^{\circ }\) images with non-Euclidean spherical PolyHeDron representation. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2020). https://doi.org/10.1109/TPAMI.2020.2997045

  50. Yan, X., Acuna, D., Fidler, S.: Neural data server: a large-scale search engine for transfer learning data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3893–3902 (2020)

    Google Scholar 

  51. Yang, Q., Li, C., Dai, W., Zou, J., Qi, G.J., Xiong, H.: Rotation equivariant graph convolutional network for spherical image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4303–4312 (2020)

    Google Scholar 

  52. Zhao, P., You, A., Zhang, Y., Liu, J., Bian, K., Tong, Y.: Spherical criteria for fast and accurate 360 object detection. In: AAAI, pp. 12959–12966 (2020)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the project Spacetime Vision – Towards Unsupervised Learning in the 4D World funded under the EEA grant number EEA-RO-NO-2018-0496.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boyan Yuan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, B., Belbachir, N. (2020). Real-Time Detection of Multiple Targets from a Moving 360\(^{\circ }\) Panoramic Imager in the Wild. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68238-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68237-8

  • Online ISBN: 978-3-030-68238-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics