Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates

Biddulph, Alexander; Houliston, Trent; Mendes, Alexandre; Chalup, Stephan

doi:10.1007/978-981-99-8076-5_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14452))

Included in the following conference series:

International Conference on Neural Information Processing

422 Accesses

Abstract

The Visual Mesh is an input transform for deep learning that allows depth independent object detection at very high frame rates. The present study introduces a Visual Mesh based stereo vision method for sparse stereo semantic segmentation. A dataset of simulated 3D scenes was generated and used for training to show that the method is capable of processing high resolution stereo inputs to generate both left and right sparse semantic maps. The new stereo method demonstrated better classification accuracy than the corresponding monocular approach. The high frame rates and high accuracy may make the proposed approach attractive to fast-paced on-board robot or IoT applications.

A. Biddulph was supported by an Australian Government Research Training Program scholarship and a top-up scholarship through 4Tel Pty Ltd. S. Chalup was supported by ARC DP210103304.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, November 2021. https://www.tensorflow.org/
Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G.D., Brown, M.: Semantic stereo for incidental satellite images. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1524–1532, January 2019. https://doi.org/10.1109/WACV.2019.00167
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Chang_Pyramid_Stereo_Matching_CVPR_2018_paper.html
Chen, H., et al.: Multi-level fusion of the multi-receptive fields contextual networks and disparity network for pairwise semantic stereo. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4967–4970, July 2019. https://doi.org/10.1109/IGARSS.2019.8899306
Chen, X., Liu, Y., Achuthan, K.: WODIS: water obstacle detection network based on image segmentation for autonomous surface vehicles in maritime environments. IEEE Trans. Instrum. Meas. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3092070
Article Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
van Dijk, S.G., Scheunemann, M.M.: Deep learning for semantic segmentation on minimal hardware. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 349–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_29
Chapter Google Scholar
Durner, M., Boerdijk, W., Sundermeyer, M., Friedl, W., Marton, Z.C., Triebel, R.: Unknown Object Segmentation from Stereo Images. arXiv:2103.06796 [cs], March 2021. http://arxiv.org/abs/2103.06796
Fan, R., Wang, H., Cai, P., Liu, M.: SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 340–356. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_21
Chapter Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
Houliston, T., Chalup, S.K.: Visual mesh: real-time object detection using constant sample density. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 45–56. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_4
Chapter Google Scholar
Houliston, T.J.: Software architecture and computer vision for resource constrained robotics. Ph.D. thesis, University of Newcastle (2018). http://hdl.handle.net/1959.13/1389336
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861 [cs], April 2017. http://arxiv.org/abs/1704.04861
Huang, G., Liu, Z., v. d. Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, July 2017. https://doi.org/10.1109/CVPR.2017.243
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html
Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 1405–1410, October 2019. https://doi.org/10.1109/ITSC.2019.8917330
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Chapter Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9
Michel, O.: Cyberbotics Ltd., Webots ^™: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1(1), 5 (2004). https://doi.org/10.5772/5618
Miclea, V.C., Nedevschi, S.: Real-time semantic segmentation-based stereo reconstruction. IEEE Trans. Intell. Transp. Syst. 21(4), 1514–1524 (2020). https://doi.org/10.1109/TITS.2019.2913883
Article Google Scholar
Mohammed, A., Yildirim, S., Farup, I., Pedersen, M., Hovde, Ø.: StreoScenNet: surgical stereo robotic scene segmentation. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, pp. 174–182, March 2019. https://doi.org/10.1117/12.2512518
Peng, H., et al.: An adaptive coarse-fine semantic segmentation method for the attachment recognition on marine current turbines. Comput. Electr. Eng. 93, 107182 (2021). https://doi.org/10.1016/j.compeleceng.2021.107182. https://www.sciencedirect.com/science/article/pii/S004579062100183X
Peng, J., Shen, J., Li, X.: High-order energies for stereo segmentation. IEEE Trans. Cybernet. 46(7), 1616–1627 (2016). https://doi.org/10.1109/TCYB.2015.2453091
Article Google Scholar
Qin, R., Huang, X., Liu, W., Xiao, C.: Pairwise stereo image disparity and semantics estimation with the combination of U-Net and pyramid stereo matching network. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 4971–4974, July 2019. https://doi.org/10.1109/IGARSS.2019.8900262
Ramachandran, S., Sistu, G., McDonald, J.B., Yogamani, S.K.: Woodscape fisheye semantic segmentation for autonomous driving - CVPR 2021 OmniCV workshop challenge. CoRR abs/2107.08246 (2021). https://arxiv.org/abs/2107.08246
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv: 1506.02640 [cs], May 2016. http://arxiv.org/abs/1506.02640
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv: 1612.08242 [cs], December 2016. http://arxiv.org/abs/1612.08242
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv: 1804.02767 [cs], April 2018. http://arxiv.org/abs/1804.02767
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472, March 2017. https://doi.org/10.1109/WACV.2017.58
Szemenyei, M., Estivill-Castro, V.: Real-time scene understanding using deep neural networks for RoboCup SPL. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 96–108. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_8
Chapter Google Scholar
Tanksale, N.: Finding Good Learning Rate and The One Cycle Policy, May 2019. https://towardsdatascience.com/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6
Tasli, H.E., Alatan, A.A.: User assisted stereo image segmentation. In: 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4, October 2012. https://doi.org/10.1109/3DTV.2012.6365447
Wright, L.: Ranger - a synergistic optimizer (2019). https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7484–7493 (2019). https://openaccess.thecvf.com/content_ICCV_2019/html/Wu_Semantic_Stereo_Matching_With_Pyramid_Cost_Volumes_ICCV_2019_paper.html
Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Zhou, L., Zhang, H.: 3SP-Net: semantic segmentation network with stereo image pairs for urban scene parsing. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 503–517. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_39
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

The University of Newcastle, Callaghan, Australia
Alexander Biddulph, Alexandre Mendes & Stephan Chalup
4Tel Pty Ltd., Newcastle, Australia
Alexander Biddulph & Trent Houliston

Authors

Alexander Biddulph
View author publications
You can also search for this author in PubMed Google Scholar
Trent Houliston
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Chalup
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Biddulph .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biddulph, A., Houliston, T., Mendes, A., Chalup, S. (2024). Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_12

Download citation

DOI: https://doi.org/10.1007/978-981-99-8076-5_12
Published: 14 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8075-8
Online ISBN: 978-981-99-8076-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates