Abstract
Our goal is to develop embedded and mobile vision applications leveraging state-of-the-art visual sensors and efficient neural network architectures deployed on emerging neural computing engines for smart monitoring and inspection purposes. In this paper, we present 360\(^{\circ }\) vision system onboard an automobile or UAV platform for large field-of-view and real-time detection of multiple challenging objects. The targeted objects include flag as a deformable object; UAV as a tiny, flying object which changes its scales and positions rapidly; and grouped objects containing piled sandbags as deformable objects in a group themselves, flag and stop sign to form a scene representing an artificial fake checkpoint. Barrel distortions owing to the 360\(^{\circ }\) optics make the detection task even more challenging. A light-weight neural network model based on MobileNets architecture is transfer learned for detection of the custom objects with very limited training data. In method 1, we generated a dataset of perspective planar images via a virtual camera model which projects a patch on the hemisphere to a 2D plane. In method 2, the panomorph images are directly used without projection. Real-time detection of the objects in 360\(^{\circ }\) video is realized by feeding live streamed frames captured by the full hemispheric (180\(^{\circ }\) \(\times \) 360\(^{\circ }\)) field-of-view ImmerVision Enables panomorph lens to the trained MobileNets model. We found that with only few training data which is far less than 10 times of Vapnik–Chervonenkis dimension of the model, the MobileNets model achieves a detection rate of 80–90% for test data having a similar distribution as the training data. However, the model performance dropped drastically when it was put in action in the wild for unknown data in which both weather and lighting conditions were different. The generalization capability of the model can be improved by training with more data. The contribution of this work is a 360\(^{\circ }\) vision hardware and software system for real-time detection of challenging objects. This system could be configured for very low-power embedded applications by running inferences via a neural computing engine such as Intel Movidius NSC2 or HiSilicon Kirin 970.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
11 May 2021
In the originally published version of chapter 8 an acknowledgement was missing. This has been corrected.
References
Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_8
Thibault, S.: Enhanced surveillance system based on panomorph panoramic lenses. In: Optics and Photonics in Global Homeland Security III, vol. 6540, p. 65400E. International Society for Optics and Photonics (2007)
Thibault, S., Konen, P., Roulet, P., Villegas, M.: Novel hemispheric image formation: concepts and applications. In: Photon Management III, vol. 6994, p. 699406. International Society for Optics and Photonics (2008)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Li, C., Parikh, D., Chen, T.: Automatic discovery of groups of objects for scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2735–2742. IEEE (2012)
Moo Yi, K., Yun, K., Wan Kim, S., Jin Chang, H., Young Choi, J.: Detection of moving objects with non-stationary cameras in 5.8 ms: bringing motion detection to your mobile device. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 27–34 (2013)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Caruso, D., Engel, J., Cremers, D.: Large-scale direct slam for omnidirectional cameras. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 141–148. IEEE (2015)
Furnari, A., Farinella, G.M., Bruna, A.R., Battiato, S.: Affine covariant features for fisheye distortion local modeling. IEEE Trans. Image Process. 26(2), 696–710 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Nguyen, T.B., Chung, S.T., et al.: ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 840–845. IEEE (2016)
Rozantsev, A., Lepetit, V., Fua, P.: Detecting flying objects using a single moving camera. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 879–892 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801–808. IEEE (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Hu, H.-N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360deg sports videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3451–3460 (2017)
Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)
Ran, L., Zhang, Y., Zhang, Q., Yang, T.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)
Su, Y.-C., Grauman, K.: Learning spherical convolution for fast features from 360 imagery. In: Advances in Neural Information Processing Systems, pp. 529–539 (2017)
Baek, I., Davies, A., Yan, G., Rajkumar, R.R.: Real-time detection, tracking, and classification of moving and stationary objects using multiple fisheye images. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 447–452. IEEE (2018)
Chung, Y., Haas, P.J., Upfal, E., Kraska, T.: Unknown examples & machine learning model generalization. arXiv preprint arXiv:1808.08294 (2018)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. arXiv preprint arXiv:1801.10130 (2018)
Coors, B., Paul Condurache, A., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning so (3) equivariant representations with spherical CNNs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–68 (2018)
Georgakopoulos, S.V., Kottari, K., Delibasis, K., Plagianakos, V.P., Maglogiannis, I.: Pose recognition using convolutional neural networks on omni-directional images. Neurocomputing 280, 23–31 (2018)
Kolter, Z., Madry, A.: Adversarial robustness: theory and practice. Tutorial at NeurIPS (2018)
Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. arXiv preprint arXiv:1802.08760 (2018)
Roulet, P., et al.: Method to capture, store, distribute, share, stream and display panoramic image or video, US Patent App. 15/656,707, 24 May 2018
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for MobileNets. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 14–18. IEEE (2018)
Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
Defferrard, M., Milani, M., Gusset, F., Perraudin, N.: DeepSphere: a graph-based spherical CNN. In: International Conference on Learning Representations (2019)
Hossain, S., Lee, D.j.: Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices. Sensors 19(15), 3371 (2019)
Hou, Y.C., Sahari, K.S.M., How, D.N.T.: A review on modeling of flexible deformable object for dexterous robotic manipulation. Int. J. Adv. Rob. Syst. 16(3), 1729881419848894 (2019)
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)
Ignatov, A., et al.: AI benchmark: all about deep learning on smartphones in 2019. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3617–3635. IEEE (2019)
Perraudin, N., Defferrard, M., Kacprzak, T., Sgier, R.: DeepSphere: efficient spherical convolutional neural network with HEALPix sampling for cosmological applications. Astron. Comput. 27, 130–146 (2019)
Reuther, A., Michaleas, P., Jones, M., Gadepally, V., Samsi, S., Kepner, J.: Survey and benchmarking of machine learning accelerators. arXiv preprint arXiv:1908.11348 (2019)
Wang, K.-H., Lai, S.-H.: Object detection in curved space for 360-degree camera. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3642–3646. IEEE (2019)
Wang, T., Hsieh, Y.Y., Wong, F.W., Chen, Y.F.: Mask-RCNN based people detection using a top-view fisheye camera. In: 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 1–4. IEEE (2019)
Yu, D., Ji, S.: Grid based spherical CNN for object detection from panoramic images. Sensors 19(11), 2622 (2019)
Zhu, J., Zhu, J., Wan, X., Wu, C., Xu, C.: Object detection and localization in 3D environment by fusing raw fisheye image and attitude data. J. Vis. Commun. Image Represent. 59, 128–139 (2019)
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11030–11039 (2020)
Chou, S.-H., Sun, C., Chang, W.Y., Hsu, W.T., Sun, M., Fu, J.: 360-indoor: towards learning real-world objects in 360deg indoor equirectangular images. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 845–853 (2020)
Farahi, F., Yazdi, H.S.: Probabilistic Kalman filter for moving object tracking. Sig. Process. Image Commun. 82, 115751 (2020)
Gkitsas, V., Zioulis, N., Alvarez, F., Zarpalas, D., Daras, P.: Deep lighting environment map estimation from spherical panoramas. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 640–641 (2020)
Jin, L., et al.: Geometric structure based and regularized depth estimation from 360 indoor imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 889–898 (2020)
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.-J.: SpherePHD: applying CNNs on 360\(^{\circ }\) images with non-Euclidean spherical PolyHeDron representation. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2020). https://doi.org/10.1109/TPAMI.2020.2997045
Yan, X., Acuna, D., Fidler, S.: Neural data server: a large-scale search engine for transfer learning data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3893–3902 (2020)
Yang, Q., Li, C., Dai, W., Zou, J., Qi, G.J., Xiong, H.: Rotation equivariant graph convolutional network for spherical image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4303–4312 (2020)
Zhao, P., You, A., Zhang, Y., Liu, J., Bian, K., Tong, Y.: Spherical criteria for fast and accurate 360 object detection. In: AAAI, pp. 12959–12966 (2020)
Acknowledgement
This work is supported by the project Spacetime Vision – Towards Unsupervised Learning in the 4D World funded under the EEA grant number EEA-RO-NO-2018-0496.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yuan, B., Belbachir, N. (2020). Real-Time Detection of Multiple Targets from a Moving 360\(^{\circ }\) Panoramic Imager in the Wild. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-68238-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68237-8
Online ISBN: 978-3-030-68238-5
eBook Packages: Computer ScienceComputer Science (R0)