Abstract
Depth estimation of pixels having low intensity gradient is a difficult task. Since they contain very less information with its neighborhood (i.e., similar in intensity due to lack of texture), their tracking for depth becomes difficult in consecutive temporal images. In recent times, dense-mapping methods based on mid-level features such as Superpixels or planes are being proposed to estimate depth of the low-gradient pixels. These methods have advantage of less computation resource requirements, with the possibility of computing full Monocular-Simultanuous Localization And Mapping (SLAM) processing on a CPU, i.e., these do not require GPU. In the sperpixel based existing approaches, the superpixels are formed by extracting similar intensity pixels and their depth map is estimated from their high gradient border pixels. Whereas the drawbacks are higher computational time is required for the superpixel segmentation, semi-dense mapping and planar mapping. This paper proposes a method to estimate the depth map of planar regions using binary images instead of full intensity range images (gray/color image) and a novel solution to differentiate differently oriented planes grouped as a single superpixel due to similarity in the intensities. The computational time is reduced while improving the efficiency of the segmentation, which is demonstrated in real-time experiments. Efficacy of the proposed method is compared with recent techniques based on learning approach and model-based approach. The proposed method is compared with the existing methods using publicly available datasets. The results show, higher density and lesser computational time for the depth map estimation in the proposed method when compared with other model-based approaches. When compared with learning-based approaches, the proposed method shows better accuracy in the depth estimation.
Similar content being viewed by others
References
Siegwart, R. (2011). Illah Reza Nourbakhsh. Davide Scaramuzza: Introduction to Autonomous Mobile Robots. MIT Press.
Scaramuzza, D., & Fraundorfer, F. (2011). Visual odometry [tutorial]. IEEE Robotics Automation Magazine, 18(4), 80–92. https://doi.org/10.1109/MRA.2011.943233
Wang, K., Ding, W., Shen, S. (2018). Quadtree-accelerated real-time monocular dense mapping. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–9. https://doi.org/10.1109/IROS.2018.8594101
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). SVO: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. https://doi.org/10.1109/ICRA.2014.6906584
Mur-Artal, R., Montiel, J. M. M., Tardós, J. D. (2015). ORB-SLAM: A versatile and accurate monocular slam system. 2015 IEEE Transactions on Robotics 31(5), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671
Engel, J., Koltun, V., & Cremers, D. (2018). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611–625. https://doi.org/10.1109/TPAMI.2017.2658577
Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular slam. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision - ECCV 2014 (pp. 834–849). Cham: Springer International Publishing.
Newcombe, R. A., Lovegrove, S. J., Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327. https://doi.org/10.1109/ICCV.2011.6126513
Fraundorfer, F., & Scaramuzza, D. (2012). Visual odometry : Part ii: Matching, robustness, optimization, and applications. IEEE Robotics Automation Magazine, 19(2), 78–90. https://doi.org/10.1109/MRA.2012.2182810
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. https://doi.org/10.1109/34.868688
Concha, A., & Civera, J. (2014). Using superpixels in monocular slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 365–372. https://doi.org/10.1109/ICRA.2014.6906883
Wang, K., & Shen, S. (2018). MVDepthNet: Real-time multiview depth estimation neural network. In: 2018 International Conference on 3D Vision (3DV), pp. 248–257. https://doi.org/10.1109/3DV.2018.00037
Concha, A., & Civera, J. (2015). DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5686–5693. https://github.com/alejocb/dpptam
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181. https://doi.org/10.1023/B:VISI.0000022288.19776.77
Zahn, C. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers C-20(1), 68–86. https://doi.org/10.1109/T-C.1971.223083
Shichao, Y., Maturana, D., & Scherer, S. (2016). Real-time 3D scene layout from a single image using convolutional neural networks. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2183–2189. https://doi.org/10.1109/ICRA.2016.7487368
Godard, C., Aodha, O.M., Firman, M., & Brostow, G. (2019). Digging into self-supervised monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837. https://doi.org/10.1109/ICCV.2019.00393
Grana, C., Borghesani, D., & Cucchiara, R. (2010). Optimized block-based connected components labeling with decision trees. 2010 IEEE Transactions on Image Processing 19(6), 1596–1609. https://doi.org/10.1109/TIP.2010.2044963
Qin, T., Li, P., & Shen, S. (2018). VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4), 1004–1020. https://doi.org/10.1109/TRO.2018.2853729
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’81, p. 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Shi, J., Tomasi. (1994). Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600. https://doi.org/10.1109/CVPR.1994.323794
Forster, C., Carlone, L., Dellaert, F., & Scaramuzza, D. (2017). On-manifold preintegration for real-time visual-inertial odometry. 2017 IEEE Transactions on Robotics 33(1), 1–21 . https://doi.org/10.1109/TRO.2016.2597321
Martinelli, A. (2012). Vision and imu data fusion: Closed-form solutions for attitude, speed, absolute scale, and bias determination. IEEE Transactions on Robotics, 28(1), 44–60. https://doi.org/10.1109/TRO.2011.2160468
Bharadwaja, Y., Vaitheeswaran, S. M., & Ananda, C. M. (2021). An efficient approach to initialization of visual-inertial navigation system using closed-form solution for autonomous robots. Journal of Intelligent & Robotic Systems 101(3), 59. https://doi.org/10.1007/s10846-021-01313-5
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580 (2012). https://doi.org/10.1109/IROS.2012.6385773
MeshLab documentation. https://www.meshlab.net/. Accessed: 25 Sept 2021
psutil documentation. https://psutil.readthedocs.io/en/latest/#psutil.virtual_memory. Accessed: 25 Sept 2021
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M. W., & Siegwart, R. (2016). The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research 35(10), 1157–1163. https://doi.org/10.1177/0278364915620033
NVIDIA TX2 specifications. https://developer.nvidia.com/embedded/jetson-tx2. Accessed: 25 Sept 2021
Pixhawk 4 documentation. https://docs.px4.io/master/en/flight_controller/pixhawk4.html. Accessed: 25 Sept 2021
Rehder, J., Nikolic, J., Schneider, T., Hinzmann, T., & Siegwart, R. (2016). Extending kalibr: Calibrating the extrinsics of multiple imus and of individual axes. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4304–4311. https://github.com/ethz-asl/kalibr/wiki
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yathirajam, B., Sevoor Meenakshisundaram, V. & Challaghatta Muniyappa, A. Superpixels Using Binary Images for Monocular Visual-Inertial Dense Mapping. J Sign Process Syst 94, 1485–1505 (2022). https://doi.org/10.1007/s11265-022-01754-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-022-01754-7