Skip to main content
Log in

DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

We present DeepPilot4Pose, a compact convolutional neural network for visual pose estimation that runs onboard novel smart camera, the OAK-D. We aim at using it for micro aerial vehicle (MAV) localisation, which flies in an indoor environment, where neither GPS nor external sensors are available. This calls for onboard processing, which demands a combination of software and hardware that could run efficiently onboard the MAV. To this end, we exploit the use of this novel sensor that can be carried by the MAV, the OAK-D camera, capable of performing neural inference on its chip in addition to providing colour, monochromatic and depth images. We show that our DeepPilot4Pose can run efficiently on the OAK-D at \(65\,{\text {Hz}}\) with a localisation performance comparable to that obtained with RGB-D ORB-SLAM using the OAK-D and running onboard the MAV on the Intel Compute Stick at \(12 \,{\text {Hz}}\). We have evaluated our approach with benchmark datasets and in real MAV flights in an indoor facility with a challenging visual appearance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets generated and analysed during the current study are available at https://mnemosyne.inaoep.mx/index.php/s/uDiD4SZjw19EYuz. Additionally, we include the performance video https://www.youtube.com/watch?v=Jtf8e06CZoo.

References

  1. Oak-d: Hardware specifications. https://docs.luxonis.com/projects/hardware/en/latest/pages/BW1098OAK.html. Accessed 14 Aug 2022

  2. Balntas, V., Li, S., Prisacariu, V.: Relocnet: continuous metric learning relocalisation using neural nets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 751–767 (2018)

  3. Blanton, H., Workman, S., Jacobs, N.: A structure-aware method for direct pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2019–2028 (2022)

  4. Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6684–6692 (2017)

  5. Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4654–4662 (2018)

  6. Chen, J., Li, S., Liu, D., Lu, W.: Indoor camera pose estimation via style-transfer 3d models. Comput. Aided Civ. Infrastruct. Eng. 37(3), 335–353 (2022)

    Article  Google Scholar 

  7. Civera, J., Grasa, O.G., Davison, A.J., Montiel, J.: 1-point ransac for ekf-based structure from motion. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3498–3504. IEEE (2009)

  8. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  9. Do, T., Miksik, O., DeGol, J., Park, H.S., Sinha, S.N.: Learning to detect scene landmarks for camera localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11132–11142 (2022)

  10. Elmoogy, A., Dong, X., Lu, T., Westendorp, R., Reddy, K.: Pose-gnn: camera pose estimation system using graph neural networks. arXiv preprint arXiv:2103.09435 (2021)

  11. Gee, A.P.: Incorporating higher level structure in visual slam. Ph.D. thesis, Citeseer (2010)

  12. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  13. Germain, H., DeTone, D., Pascoe, G., Schmidt, T., Novotny, D., Newcombe, R., Sweeney, C., Szeliski, R., Balntas, V.: Feature query networks: neural surface description for camera pose refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5071–5081 (2022)

  14. Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, pp. 182–193. International Society for Optics and Photonics (1997)

  15. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983 (2017)

  16. Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)

  17. Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 929–938 (2017)

  18. Li, R., Wang, S., Long, Z., Gu, D.: Undeepvo: Monocular visual odometry through unsupervised deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7291. IEEE (2018)

  19. Li, X., Ling, H.: Transcamp: Graph transformer for 6-dof camera pose estimation. arXiv preprint arXiv:2105.14065 (2021)

  20. Liu, H., Chen, Q., Pan, N., Sun, Y., An, Y., Pan, D.: Uav stocktaking task-planning for industrial warehouses based on improved hybrid differential evolution algorithm. m. IEEE Transactions on Industrial Informatics 18(1), 582–591 (2022). https://doi.org/10.1109/TII.2021.3054172

  21. Mahendran, S., Ali, H., Vidal, R.: 3d pose regression using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2174–2182 (2017)

  22. Martinez-Carranza, J., Calway, A., Mayol-Cuevas, W.: Enhancing 6d visual relocalisation with depth cameras. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 899–906. IEEE (2013)

  23. Moon, H., Martinez-Carranza, J., Cieslewski, T., Faessler, M., Falanga, D., Simovic, A., Scaramuzza, D., Li, S., Ozo, M., De Wagter, C., et al.: Challenges and implemented technologies used in autonomous drone racing. Intell. Serv. Robot. 12(2), 137–148 (2019)

    Article  Google Scholar 

  24. Moreau, A., Gilles, T., Piasco, N., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Imposing: implicit pose encoding for efficient camera pose estimation. arXiv preprint arXiv:2205.02638 (2022)

  25. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  26. Qiao, C., Xiang, Z., Wang, X.: Objects matter: learning object relation graph for robust camera relocalization. arXiv preprint arXiv:2205.13280 (2022)

  27. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y., et al.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)

  28. Radwan, N., Valada, A., Burgard, W.: Vlocnet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot. Autom. Lett. 3(4), 4407–4414 (2018)

    Article  Google Scholar 

  29. Rojas-Perez, L.O., Martinez-Carranza, J.: Deeppilot: a cnn for autonomous drone racing. Sensors 20(16), 4524 (2020)

    Article  Google Scholar 

  30. Rojas-Perez, L.O., Martinez-Carranza, J.: Towards autonomous drone racing without gpu using an oak-d smart camera. Sensors 21(22), 7436 (2021)

    Article  Google Scholar 

  31. Shavit, Y., Ferens, R.: Introduction to camera pose estimation with deep learning. arXiv preprint arXiv:1907.05272 (2019)

  32. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937 (2013)

  33. Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5038–5047 (2017)

  34. Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6939–6946. IEEE (2018)

  35. Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 627–637 (2017)

  36. Wan, E.A., Van Der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pp. 153–158. IEEE (2000)

  37. Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: Atloc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)

  38. Williams, B., Klein, G., Reid, I.: Automatic relocalization and loop closing for real-time monocular slam. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1699–1712 (2011)

    Article  Google Scholar 

  39. Wu, J., Ma, L., Hu, X.: Delving deeper into convolutional neural networks for camera relocalization. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5644–5651. IEEE (2017)

  40. Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)

  41. Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: Sanet: scene agnostic network for camera localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 42–51 (2019)

  42. Zhao, B., Huang, Y., Ci, W., Hu, X.: Unsupervised learning of monocular depth and ego-motion with optical flow features and multiple constraints. Sensors 22(4), 1383 (2022)

    Article  Google Scholar 

  43. Zhao, W., Liu, S., Shu, Y., Liu, Y.J.: Towards better generalization: Joint depth-pose learning without posenet. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9151–9161 (2020)

  44. Zhu, Y., Gao, R., Huang, S., Zhu, S.C., Wu, Y.N.: Learning neural representation of camera pose with matrix representation of pose shift via view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9959–9968 (2021)

Download references

Acknowledgements

The first author is thankful to Consejo Nacional de Ciencia y Tecnologia (CONACYT) for her scholarship no. 924254.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Martinez-Carranza.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rojas-Perez, L.O., Martinez-Carranza, J. DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera. J Real-Time Image Proc 20, 8 (2023). https://doi.org/10.1007/s11554-023-01259-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01259-x

Keywords

Navigation