Abstract
The Visual Mesh is an input transform for deep learning that allows depth independent object detection at very high frame rates. The present study introduces a Visual Mesh based stereo vision method for sparse stereo semantic segmentation. A dataset of simulated 3D scenes was generated and used for training to show that the method is capable of processing high resolution stereo inputs to generate both left and right sparse semantic maps. The new stereo method demonstrated better classification accuracy than the corresponding monocular approach. The high frame rates and high accuracy may make the proposed approach attractive to fast-paced on-board robot or IoT applications.
A. Biddulph was supported by an Australian Government Research Training Program scholarship and a top-up scholarship through 4Tel Pty Ltd. S. Chalup was supported by ARC DP210103304.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, November 2021. https://www.tensorflow.org/
Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G.D., Brown, M.: Semantic stereo for incidental satellite images. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1524–1532, January 2019. https://doi.org/10.1109/WACV.2019.00167
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Chang_Pyramid_Stereo_Matching_CVPR_2018_paper.html
Chen, H., et al.: Multi-level fusion of the multi-receptive fields contextual networks and disparity network for pairwise semantic stereo. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4967–4970, July 2019. https://doi.org/10.1109/IGARSS.2019.8899306
Chen, X., Liu, Y., Achuthan, K.: WODIS: water obstacle detection network based on image segmentation for autonomous surface vehicles in maritime environments. IEEE Trans. Instrum. Meas. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3092070
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
van Dijk, S.G., Scheunemann, M.M.: Deep learning for semantic segmentation on minimal hardware. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 349–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_29
Durner, M., Boerdijk, W., Sundermeyer, M., Friedl, W., Marton, Z.C., Triebel, R.: Unknown Object Segmentation from Stereo Images. arXiv:2103.06796 [cs], March 2021. http://arxiv.org/abs/2103.06796
Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 340–356. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_21
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
Houliston, T., Chalup, S.K.: Visual mesh: real-time object detection using constant sample density. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 45–56. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_4
Houliston, T.J.: Software architecture and computer vision for resource constrained robotics. Ph.D. thesis, University of Newcastle (2018). http://hdl.handle.net/1959.13/1389336
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861 [cs], April 2017. http://arxiv.org/abs/1704.04861
Huang, G., Liu, Z., v. d. Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, July 2017. https://doi.org/10.1109/CVPR.2017.243
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html
Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 1405–1410, October 2019. https://doi.org/10.1109/ITSC.2019.8917330
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9
Michel, O.: Cyberbotics Ltd., Webots ™: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1(1), 5 (2004). https://doi.org/10.5772/5618
Miclea, V.C., Nedevschi, S.: Real-time semantic segmentation-based stereo reconstruction. IEEE Trans. Intell. Transp. Syst. 21(4), 1514–1524 (2020). https://doi.org/10.1109/TITS.2019.2913883
Mohammed, A., Yildirim, S., Farup, I., Pedersen, M., Hovde, Ø.: StreoScenNet: surgical stereo robotic scene segmentation. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, pp. 174–182, March 2019. https://doi.org/10.1117/12.2512518
Peng, H., et al.: An adaptive coarse-fine semantic segmentation method for the attachment recognition on marine current turbines. Comput. Electr. Eng. 93, 107182 (2021). https://doi.org/10.1016/j.compeleceng.2021.107182. https://www.sciencedirect.com/science/article/pii/S004579062100183X
Peng, J., Shen, J., Li, X.: High-order energies for stereo segmentation. IEEE Trans. Cybernet. 46(7), 1616–1627 (2016). https://doi.org/10.1109/TCYB.2015.2453091
Qin, R., Huang, X., Liu, W., Xiao, C.: Pairwise stereo image disparity and semantics estimation with the combination of U-Net and pyramid stereo matching network. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4971–4974, July 2019. https://doi.org/10.1109/IGARSS.2019.8900262
Ramachandran, S., Sistu, G., McDonald, J.B., Yogamani, S.K.: Woodscape fisheye semantic segmentation for autonomous driving - CVPR 2021 OmniCV workshop challenge. CoRR abs/2107.08246 (2021). https://arxiv.org/abs/2107.08246
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv: 1506.02640 [cs], May 2016. http://arxiv.org/abs/1506.02640
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv: 1612.08242 [cs], December 2016. http://arxiv.org/abs/1612.08242
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv: 1804.02767 [cs], April 2018. http://arxiv.org/abs/1804.02767
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472, March 2017. https://doi.org/10.1109/WACV.2017.58
Szemenyei, M., Estivill-Castro, V.: Real-time scene understanding using deep neural networks for RoboCup SPL. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 96–108. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_8
Tanksale, N.: Finding Good Learning Rate and The One Cycle Policy, May 2019. https://towardsdatascience.com/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6
Tasli, H.E., Alatan, A.A.: User assisted stereo image segmentation. In: 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4, October 2012. https://doi.org/10.1109/3DTV.2012.6365447
Wright, L.: Ranger - a synergistic optimizer (2019). https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7484–7493 (2019). https://openaccess.thecvf.com/content_ICCV_2019/html/Wu_Semantic_Stereo_Matching_With_Pyramid_Cost_Volumes_ICCV_2019_paper.html
Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
Zhou, L., Zhang, H.: 3SP-Net: semantic segmentation network with stereo image pairs for urban scene parsing. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 503–517. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_39
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Biddulph, A., Houliston, T., Mendes, A., Chalup, S. (2024). Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_12
Download citation
DOI: https://doi.org/10.1007/978-981-99-8076-5_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8075-8
Online ISBN: 978-981-99-8076-5
eBook Packages: Computer ScienceComputer Science (R0)