Cross-CBAM: a lightweight network for real-time scene segmentation

Zhang, Zhengbin; Xu, Zhenhao; Gu, Xingsheng; Xiong, Juan

doi:10.1007/s11554-024-01414-y

Cross-CBAM: a lightweight network for real-time scene segmentation

Research
Published: 24 February 2024

Volume 21, article number 38, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Zhengbin Zhang¹,
Zhenhao Xu¹,
Xingsheng Gu¹ &
…
Juan Xiong²

176 Accesses
Explore all metrics

Abstract

Real-time semantic segmentation poses a significant challenge in scene parsing. Despite traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed remains unsatisfactory. This paper introduces the Cross-CBAM network, a novel lightweight architecture designed for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module (SE-ASPP) is proposed to obtain variable field-of-view and multiscale information. Additionally, we propose a Cross Convolutional Block Attention Module (CCBAM), wherein a cross-multiply operation guides low-level detail information with high-level semantic information. Unlike previous approaches that leverage attention to concentrate on the relevant information in the backbone, CCBAM utilizes cross-attention for feature fusion within the Feature Pyramid Network (FPN) structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9 FPS and 77.2% mIoU with a speed of 88.6 FPS on NVIDIA GTX 1080Ti.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Buffer ladder feature fusion architecture for semantic segmentation improvement

Article 23 September 2023

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

Semantic Flow for Fast and Accurate Scene Parsing

Data availability

Data available on request from the authors.

References

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 127(3), 302–321 (2019)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, Springer, pp. 740–755 (2014)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Wu, Y., Kong, Q., Zhang, L., Castiglione, A., Nappi, M., Wan, S.: Cdt-cad: context-aware deformable transformers for end-to-end chest abnormality detection on X-ray images. IEEE/ACM Trans. Comput. Biol. Bioinform. (2023). https://doi.org/10.1109/TCBB.2023.3258455
Article Google Scholar
Wu, Y., Cao, H., Yang, G., Lu, T., Wan, S.: Digital twin of intelligent small surface defect detection with cyber-manufacturing systems. ACM Trans. Internet Technol. 23(4), 1–20 (2023)
Article Google Scholar
Wu, Z., Shen, C., Hengel, A.V.D.: Real-time semantic image segmentation via spatial sparsity. arXiv preprint arXiv:1712.00213 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9716–9725 (2021)
Hung, S.-W., Lo, S.-Y., Hang, H.-M.: Incorporating luminance, depth and color information by a fusion-based network for semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 2374–2378 (2019)
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: European Conference on Computer Vision, Springer, pp. 775–793 (2020)
Song, Q., Mei, K., Huang, R.: Attanet: attention-augmented network for fast and accurate scene parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2567–2575 (2021)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J.: Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 1860–1864 (2019)
Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., Wu, Z., Chen, Z., Yu, Z., Du, Y., et al.: Pp-liteseg: a superior real-time semantic segmentation model. arXiv preprint arXiv:2204.02681 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Wu, Y., Zhang, L., Gu, Z., Lu, H., Wan, S.: Edge-ai-driven framework with efficient mobile network design for facial expression recognition. ACM Trans. Embed. Comput. Syst. 22(3), 1–17 (2023)
Article Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. Adv. Neural Inform. Process. Syst. 31 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129(11), 3051–3068 (2021)
Article Google Scholar
Xiao, C., Hao, X., Li, H., Li, Y., Zhang, W.: Real-time semantic segmentation with local spatial pixel adjustment. Image Vis. Comput. 123, 104470 (2022)
Article Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Li, Y., Li, M., Li, Z., Xiao, C., Li, H.: Correction to: Efrnet: efficient feature reuse network for real-time semantic segmentation. Neural Process. Lett. 55(1), 873–873 (2023)
Article Google Scholar
Dong, Y., Yang, H., Pei, Y., Shen, L., Zheng, L., Li, P.: Compact interactive dual-branch network for real-time semantic segmentation. Complex Intell. Syst. 9, 1–14 (2023)
Article Google Scholar
Xu, G., Li, J., Gao, G., Lu, H., Yang, J., Yue, D.: Lightweight real-time semantic segmentation network with efficient transformer and CNN. IEEE Trans. Intell. Transp. Syst. (2023). https://doi.org/10.1109/TITS.2023.3248089
Article Google Scholar
Meng, P., Jia, S., Li, Q.: Dmbr-net: deep multiple-resolution bilateral network for real-time and accurate semantic segmentation. Complex Intell. Syst. 9, 1–10 (2023)
Article Google Scholar
Guo, M.-H., Lu, C.-Z., Hou, Q., Liu, Z., Cheng, M.-M., Hu, S.-M.: Segnext: Rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)
Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085 (2021)
Hu, P., Caba, F., Wang, O., Lin, Z., Sclaroff, S., Perazzi, F.: Temporally distributed networks for fast video semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8818–8827 (2020)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Contributors, M.: MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. https://github.com/open-mmlab/mmsegmentation (2020)

Download references

Author information

Authors and Affiliations

School of Information and Engineering, East China University of Science and Technology, MeiLong Road, XuHui District, Shanghai, 200237, China
Zhengbin Zhang, Zhenhao Xu & Xingsheng Gu
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, JunGon Road, YangPu District, Shanghai, 200093, China
Juan Xiong

Authors

Zhengbin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xingsheng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Juan Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenhao Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Xu, Z., Gu, X. et al. Cross-CBAM: a lightweight network for real-time scene segmentation. J Real-Time Image Proc 21, 38 (2024). https://doi.org/10.1007/s11554-024-01414-y

Download citation

Received: 13 July 2023
Accepted: 02 January 2024
Published: 24 February 2024
DOI: https://doi.org/10.1007/s11554-024-01414-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-CBAM: a lightweight network for real-time scene segmentation

Abstract

Access this article

Similar content being viewed by others

Buffer ladder feature fusion architecture for semantic segmentation improvement

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Semantic Flow for Fast and Accurate Scene Parsing

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-CBAM: a lightweight network for real-time scene segmentation

Abstract

Access this article

Similar content being viewed by others

Buffer ladder feature fusion architecture for semantic segmentation improvement

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Semantic Flow for Fast and Accurate Scene Parsing

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation