DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera

Rojas-Perez, L. Oyuki; Martinez-Carranza, Jose

doi:10.1007/s11554-023-01259-x

DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera

Original Research Paper
Published: 02 February 2023

Volume 20, article number 8, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

467 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

We present DeepPilot4Pose, a compact convolutional neural network for visual pose estimation that runs onboard novel smart camera, the OAK-D. We aim at using it for micro aerial vehicle (MAV) localisation, which flies in an indoor environment, where neither GPS nor external sensors are available. This calls for onboard processing, which demands a combination of software and hardware that could run efficiently onboard the MAV. To this end, we exploit the use of this novel sensor that can be carried by the MAV, the OAK-D camera, capable of performing neural inference on its chip in addition to providing colour, monochromatic and depth images. We show that our DeepPilot4Pose can run efficiently on the OAK-D at \(65\,{\text {Hz}}\) with a localisation performance comparable to that obtained with RGB-D ORB-SLAM using the OAK-D and running onboard the MAV on the Intel Compute Stick at \(12 \,{\text {Hz}}\). We have evaluated our approach with benchmark datasets and in real MAV flights in an indoor facility with a challenging visual appearance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Deep Learning and RGBD SLAM for Monocular Indoor Autonomous Flight

A compact CNN approach for drone localisation in autonomous drone racing

Article 14 August 2021

An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation

Article Open access 31 October 2023

Data availability

The datasets generated and analysed during the current study are available at https://mnemosyne.inaoep.mx/index.php/s/uDiD4SZjw19EYuz. Additionally, we include the performance video https://www.youtube.com/watch?v=Jtf8e06CZoo.

References

Oak-d: Hardware specifications. https://docs.luxonis.com/projects/hardware/en/latest/pages/BW1098OAK.html. Accessed 14 Aug 2022
Balntas, V., Li, S., Prisacariu, V.: Relocnet: continuous metric learning relocalisation using neural nets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 751–767 (2018)
Blanton, H., Workman, S., Jacobs, N.: A structure-aware method for direct pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2019–2028 (2022)
Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6684–6692 (2017)
Brachmann, E., Rother, C.: Learning less is more-6d camera localization via 3d surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4654–4662 (2018)
Chen, J., Li, S., Liu, D., Lu, W.: Indoor camera pose estimation via style-transfer 3d models. Comput. Aided Civ. Infrastruct. Eng. 37(3), 335–353 (2022)
Article Google Scholar
Civera, J., Grasa, O.G., Davison, A.J., Montiel, J.: 1-point ransac for ekf-based structure from motion. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3498–3504. IEEE (2009)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Do, T., Miksik, O., DeGol, J., Park, H.S., Sinha, S.N.: Learning to detect scene landmarks for camera localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11132–11142 (2022)
Elmoogy, A., Dong, X., Lu, T., Westendorp, R., Reddy, K.: Pose-gnn: camera pose estimation system using graph neural networks. arXiv preprint arXiv:2103.09435 (2021)
Gee, A.P.: Incorporating higher level structure in visual slam. Ph.D. thesis, Citeseer (2010)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Germain, H., DeTone, D., Pascoe, G., Schmidt, T., Novotny, D., Newcombe, R., Sweeney, C., Szeliski, R., Balntas, V.: Feature query networks: neural surface description for camera pose refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5071–5081 (2022)
Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, pp. 182–193. International Society for Optics and Photonics (1997)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983 (2017)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 929–938 (2017)
Li, R., Wang, S., Long, Z., Gu, D.: Undeepvo: Monocular visual odometry through unsupervised deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7291. IEEE (2018)
Li, X., Ling, H.: Transcamp: Graph transformer for 6-dof camera pose estimation. arXiv preprint arXiv:2105.14065 (2021)
Liu, H., Chen, Q., Pan, N., Sun, Y., An, Y., Pan, D.: Uav stocktaking task-planning for industrial warehouses based on improved hybrid differential evolution algorithm. m. IEEE Transactions on Industrial Informatics 18(1), 582–591 (2022). https://doi.org/10.1109/TII.2021.3054172
Mahendran, S., Ali, H., Vidal, R.: 3d pose regression using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2174–2182 (2017)
Martinez-Carranza, J., Calway, A., Mayol-Cuevas, W.: Enhancing 6d visual relocalisation with depth cameras. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 899–906. IEEE (2013)
Moon, H., Martinez-Carranza, J., Cieslewski, T., Faessler, M., Falanga, D., Simovic, A., Scaramuzza, D., Li, S., Ozo, M., De Wagter, C., et al.: Challenges and implemented technologies used in autonomous drone racing. Intell. Serv. Robot. 12(2), 137–148 (2019)
Article Google Scholar
Moreau, A., Gilles, T., Piasco, N., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Imposing: implicit pose encoding for efficient camera pose estimation. arXiv preprint arXiv:2205.02638 (2022)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Qiao, C., Xiang, Z., Wang, X.: Objects matter: learning object relation graph for robust camera relocalization. arXiv preprint arXiv:2205.13280 (2022)
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y., et al.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)
Radwan, N., Valada, A., Burgard, W.: Vlocnet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot. Autom. Lett. 3(4), 4407–4414 (2018)
Article Google Scholar
Rojas-Perez, L.O., Martinez-Carranza, J.: Deeppilot: a cnn for autonomous drone racing. Sensors 20(16), 4524 (2020)
Article Google Scholar
Rojas-Perez, L.O., Martinez-Carranza, J.: Towards autonomous drone racing without gpu using an oak-d smart camera. Sensors 21(22), 7436 (2021)
Article Google Scholar
Shavit, Y., Ferens, R.: Introduction to camera pose estimation with deep learning. arXiv preprint arXiv:1907.05272 (2019)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937 (2013)
Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5038–5047 (2017)
Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6939–6946. IEEE (2018)
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 627–637 (2017)
Wan, E.A., Van Der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pp. 153–158. IEEE (2000)
Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: Atloc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)
Williams, B., Klein, G., Reid, I.: Automatic relocalization and loop closing for real-time monocular slam. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1699–1712 (2011)
Article Google Scholar
Wu, J., Ma, L., Hu, X.: Delving deeper into convolutional neural networks for camera relocalization. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5644–5651. IEEE (2017)
Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)
Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: Sanet: scene agnostic network for camera localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 42–51 (2019)
Zhao, B., Huang, Y., Ci, W., Hu, X.: Unsupervised learning of monocular depth and ego-motion with optical flow features and multiple constraints. Sensors 22(4), 1383 (2022)
Article Google Scholar
Zhao, W., Liu, S., Shu, Y., Liu, Y.J.: Towards better generalization: Joint depth-pose learning without posenet. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9151–9161 (2020)
Zhu, Y., Gao, R., Huang, S., Zhu, S.C., Wu, Y.N.: Learning neural representation of camera pose with matrix representation of pose shift via view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9959–9968 (2021)

Download references

Acknowledgements

The first author is thankful to Consejo Nacional de Ciencia y Tecnologia (CONACYT) for her scholarship no. 924254.

Author information

Authors and Affiliations

INAOE, San Andrés, Cholula, Mexico
L. Oyuki Rojas-Perez & Jose Martinez-Carranza

Authors

L. Oyuki Rojas-Perez
View author publications
You can also search for this author in PubMed Google Scholar
Jose Martinez-Carranza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Martinez-Carranza.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rojas-Perez, L.O., Martinez-Carranza, J. DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera. J Real-Time Image Proc 20, 8 (2023). https://doi.org/10.1007/s11554-023-01259-x

Download citation

Received: 17 May 2022
Accepted: 30 October 2022
Published: 02 February 2023
DOI: https://doi.org/10.1007/s11554-023-01259-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera

Abstract

Access this article

Similar content being viewed by others

Combining Deep Learning and RGBD SLAM for Monocular Indoor Autonomous Flight

A compact CNN approach for drone localisation in autonomous drone racing

An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DeepPilot4Pose: a fast pose localisation for MAV indoor flight using the OAK-D camera

Abstract

Access this article

Similar content being viewed by others

Combining Deep Learning and RGBD SLAM for Monocular Indoor Autonomous Flight

A compact CNN approach for drone localisation in autonomous drone racing

An analysis of precision: occlusion and perspective geometry’s role in 6D pose estimation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation