Skip to main content

Effi-Seg: Rethinking EfficientNet Architecture for Real-Time Semantic Segmentation

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14451))

Included in the following conference series:

  • 535 Accesses

Abstract

A popular strategy for designing a semantic segmentation model is to utilize a well-established pre-trained Deep Convolutional Neural Network (DCNN) as a feature extractor and replace the classification head with a decoder to generate segmented outputs. The advantage of this strategy is the ability to obtain a ready-made backbone with additional knowledge. However, there are several disadvantages, such as a lack of architectural knowledge, a significant semantic gap among the deep feature maps, and a lack of control over architectural changes to reduce memory overhead. To overcome these issues, we first study the complete architecture of EfficientNetV1 and EfficientNetV2, analyzing the architectural and performance gaps. Based on this analysis, we develop an efficient segmentation model called Effi-Seg by implementing several architectural changes to the backbone. This approach leads to better semantic segmentation results with improved efficiency. To enhance contextualization and achieve accurate object localization in the scene, we introduce a feature refinement module (FRM) and a semantic aggregation module (SAM) in the decoder. The complete segmentation network comprises only 1.49 million parameters and 8.4 GFLOPs. We evaluate the performance of the proposed model using three popular benchmarks, and it demonstrates highly competitive results on all three datasets while maintaining excellent efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abu Alhaija, H., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126(9), 961–972 (2018). https://doi.org/10.1007/s11263-018-1070-x

    Article  Google Scholar 

  2. Cai, J., Liu, Y., Qin, P.: Attention based quick network with optical flow estimation for semantic segmentation. IEEE Access 11, 12402–12413 (2023)

    Article  Google Scholar 

  3. Cai, W., Wang, B.: DSE-Net: deep semantic enhanced network for mobile tongue image segmentation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. CCIS, vol. 1794, pp. 138–150. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-1648-1_12

  4. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  5. Choi, S., Kim, J.T., Choo, J.: Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks. In: Proceedings of the CVPR, pp. 9373–9383 (2020)

    Google Scholar 

  6. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the CVPR (2016)

    Google Scholar 

  7. Du, J.: Understanding of object detection based on CNN family and YOLO. In: Journal of Physics: Conference Series, vol. 1004, p. 012029. IOP Publishing (2018)

    Google Scholar 

  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587 (2014)

    Google Scholar 

  9. Gruosso, M., Capece, N., Erra, U.: Human segmentation in surveillance video with deep learning. Multimedia Tools Appl. 80, 1175–1199 (2021). https://doi.org/10.1007/s11042-020-09425-0

    Article  Google Scholar 

  10. Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the ICCV, pp. 1314–1324 (2019)

    Google Scholar 

  11. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  12. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the ICCV, pp. 1520–1528 (2015)

    Google Scholar 

  13. Ochs, M., Kretz, A., Mester, R.: SDNet: semantically guided depth estimation network. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 288–302. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_20

    Chapter  Google Scholar 

  14. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)

  15. Progga, P.H., Shatabda, S.: iResSENet: an accurate convolutional neural network for retinal blood vessel segmentation. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. LNCS, vol. 13625, pp. 567–578. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30111-7_48

    Chapter  Google Scholar 

  16. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  18. Singha, T., Bergemann, M., Pham, D.S., Krishna, A.: SCMNet: shared context mining network for real-time semantic segmentation. In: Proceedings of the DICTA, pp. 1–8. IEEE (2021)

    Google Scholar 

  19. Singha, T., Bergemann, M., Pham, D.S., Krishna, A.: SC-CrackSeg: a real-time shared feature pyramid network for crack detection and segmentation. In: Proceedings of the DICTA, pp. 1–8 (2022)

    Google Scholar 

  20. Singha, T., Pham, D.S., Krishna, A.: FANet: feature aggregation network for semantic segmentation. In: Proceedings of the DICTA, pp. 1–8. IEEE (2020)

    Google Scholar 

  21. Singha, T., Pham, D.S., Krishna, A.: A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders. Pattern Recogn. 140, 109557 (2023)

    Article  Google Scholar 

  22. Singha, T., Pham, D.-S., Krishna, A., Dunstan, J.: Efficient segmentation pyramid network. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. CCIS, vol. 1332, pp. 386–393. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63820-7_44

    Chapter  Google Scholar 

  23. Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: Proceedings of the CVPR, pp. 7262–7272 (2021)

    Google Scholar 

  24. Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the ICML, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  25. Tan, M., Le, Q.: EfficientNetV2: smaller models and faster training. In: Proceedings of the ICML, pp. 10096–10106. PMLR (2021)

    Google Scholar 

  26. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the CVPR, pp. 10781–10790 (2020)

    Google Scholar 

  27. Targ, S., Almeida, D., Lyman, K.: ResNet in ResNet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)

  28. Xiang, W., Mao, H., Athitsos, V.: ThunderNet: a turbo unified network for real-time semantic segmentation. In: Proceedings of the WACV, pp. 1789–1796. IEEE (2019)

    Google Scholar 

  29. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20

    Chapter  Google Scholar 

  30. Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the CVPR, pp. 2636–2645 (2020)

    Google Scholar 

  31. Zhang, W., et al.: TopFormer: token pyramid transformer for mobile semantic segmentation. In: Proceedings of the CVPR, pp. 12083–12093 (2022)

    Google Scholar 

  32. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25

    Chapter  Google Scholar 

  33. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the CVPR, pp. 2881–2890 (2017)

    Google Scholar 

  34. Zhu, Y., et al.: Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the CVPR, pp. 8856–8865 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanmay Singha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singha, T., Pham, DS., Krishna, A. (2024). Effi-Seg: Rethinking EfficientNet Architecture for Real-Time Semantic Segmentation. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14451. Springer, Singapore. https://doi.org/10.1007/978-981-99-8073-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8073-4_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8072-7

  • Online ISBN: 978-981-99-8073-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics