Skip to main content

Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Abstract

The Visual Mesh is an input transform for deep learning that allows depth independent object detection at very high frame rates. The present study introduces a Visual Mesh based stereo vision method for sparse stereo semantic segmentation. A dataset of simulated 3D scenes was generated and used for training to show that the method is capable of processing high resolution stereo inputs to generate both left and right sparse semantic maps. The new stereo method demonstrated better classification accuracy than the corresponding monocular approach. The high frame rates and high accuracy may make the proposed approach attractive to fast-paced on-board robot or IoT applications.

A. Biddulph was supported by an Australian Government Research Training Program scholarship and a top-up scholarship through 4Tel Pty Ltd. S. Chalup was supported by ARC DP210103304.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, November 2021. https://www.tensorflow.org/

  2. Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G.D., Brown, M.: Semantic stereo for incidental satellite images. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1524–1532, January 2019. https://doi.org/10.1109/WACV.2019.00167

  3. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Chang_Pyramid_Stereo_Matching_CVPR_2018_paper.html

  4. Chen, H., et al.: Multi-level fusion of the multi-receptive fields contextual networks and disparity network for pairwise semantic stereo. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4967–4970, July 2019. https://doi.org/10.1109/IGARSS.2019.8899306

  5. Chen, X., Liu, Y., Achuthan, K.: WODIS: water obstacle detection network based on image segmentation for autonomous surface vehicles in maritime environments. IEEE Trans. Instrum. Meas. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3092070

    Article  Google Scholar 

  6. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  7. van Dijk, S.G., Scheunemann, M.M.: Deep learning for semantic segmentation on minimal hardware. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 349–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_29

    Chapter  Google Scholar 

  8. Durner, M., Boerdijk, W., Sundermeyer, M., Friedl, W., Marton, Z.C., Triebel, R.: Unknown Object Segmentation from Stereo Images. arXiv:2103.06796 [cs], March 2021. http://arxiv.org/abs/2103.06796

  9. Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 340–356. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_21

    Chapter  Google Scholar 

  10. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html

  12. Houliston, T., Chalup, S.K.: Visual mesh: real-time object detection using constant sample density. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 45–56. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_4

    Chapter  Google Scholar 

  13. Houliston, T.J.: Software architecture and computer vision for resource constrained robotics. Ph.D. thesis, University of Newcastle (2018). http://hdl.handle.net/1959.13/1389336

  14. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861 [cs], April 2017. http://arxiv.org/abs/1704.04861

  15. Huang, G., Liu, Z., v. d. Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, July 2017. https://doi.org/10.1109/CVPR.2017.243

  16. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html

  17. Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 1405–1410, October 2019. https://doi.org/10.1109/ITSC.2019.8917330

  18. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3

    Chapter  Google Scholar 

  19. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324

  20. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9

  21. Michel, O.: Cyberbotics Ltd., Webots : Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1(1), 5 (2004). https://doi.org/10.5772/5618

  22. Miclea, V.C., Nedevschi, S.: Real-time semantic segmentation-based stereo reconstruction. IEEE Trans. Intell. Transp. Syst. 21(4), 1514–1524 (2020). https://doi.org/10.1109/TITS.2019.2913883

    Article  Google Scholar 

  23. Mohammed, A., Yildirim, S., Farup, I., Pedersen, M., Hovde, Ø.: StreoScenNet: surgical stereo robotic scene segmentation. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, pp. 174–182, March 2019. https://doi.org/10.1117/12.2512518

  24. Peng, H., et al.: An adaptive coarse-fine semantic segmentation method for the attachment recognition on marine current turbines. Comput. Electr. Eng. 93, 107182 (2021). https://doi.org/10.1016/j.compeleceng.2021.107182. https://www.sciencedirect.com/science/article/pii/S004579062100183X

  25. Peng, J., Shen, J., Li, X.: High-order energies for stereo segmentation. IEEE Trans. Cybernet. 46(7), 1616–1627 (2016). https://doi.org/10.1109/TCYB.2015.2453091

    Article  Google Scholar 

  26. Qin, R., Huang, X., Liu, W., Xiao, C.: Pairwise stereo image disparity and semantics estimation with the combination of U-Net and pyramid stereo matching network. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4971–4974, July 2019. https://doi.org/10.1109/IGARSS.2019.8900262

  27. Ramachandran, S., Sistu, G., McDonald, J.B., Yogamani, S.K.: Woodscape fisheye semantic segmentation for autonomous driving - CVPR 2021 OmniCV workshop challenge. CoRR abs/2107.08246 (2021). https://arxiv.org/abs/2107.08246

  28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv: 1506.02640 [cs], May 2016. http://arxiv.org/abs/1506.02640

  29. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv: 1612.08242 [cs], December 2016. http://arxiv.org/abs/1612.08242

  30. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv: 1804.02767 [cs], April 2018. http://arxiv.org/abs/1804.02767

  31. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  32. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  33. Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472, March 2017. https://doi.org/10.1109/WACV.2017.58

  34. Szemenyei, M., Estivill-Castro, V.: Real-time scene understanding using deep neural networks for RoboCup SPL. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 96–108. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_8

    Chapter  Google Scholar 

  35. Tanksale, N.: Finding Good Learning Rate and The One Cycle Policy, May 2019. https://towardsdatascience.com/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6

  36. Tasli, H.E., Alatan, A.A.: User assisted stereo image segmentation. In: 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4, October 2012. https://doi.org/10.1109/3DTV.2012.6365447

  37. Wright, L.: Ranger - a synergistic optimizer (2019). https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

  38. Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7484–7493 (2019). https://openaccess.thecvf.com/content_ICCV_2019/html/Wu_Semantic_Stereo_Matching_With_Pyramid_Cost_Volumes_ICCV_2019_paper.html

  39. Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  40. Zhou, L., Zhang, H.: 3SP-Net: semantic segmentation network with stereo image pairs for urban scene parsing. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 503–517. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_39

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Biddulph .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biddulph, A., Houliston, T., Mendes, A., Chalup, S. (2024). Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8076-5_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8075-8

  • Online ISBN: 978-981-99-8076-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics