Abstract
Different approaches were proposed to design deep CNNs for semantic segmentation. Usually, they are built upon an encoder–decoder architecture and require computationally expensive operations on high-resolution activation maps. Since for real-time segmentation the costs are critical, efficient approaches compromise spatial information to achieve real-time segmentation but with a considerable drop in accuracy. We introduce a new module based on depthwise separable, shuffled and grouped convolutions that optimize up-sampling operations by using a sizeable receptive field and preserving spatial information. Then, we designed an efficient network based on dense connectivity to achieve a remarkable trade-off accuracy and speed. We show through set of experiments that even by up-sampling with a lightweight decoder, our applied architecture scores on Cityscape 69.5% Mean IoU with \(1024\times 512\) inputs and 95.2 FPS on the test set.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Wu, Z., Shen, C., Van Den Hengel, A.: “Real-time Semantic Image Segmentation via Spatial Sparsity,” arXiv (2017)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2016)
Chollet, F.: “Xception: deep learning with depthwise separable convolutions (2017)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: “ENet: a deep neural network architecture for real-time semantic segmentation, pp. 1–10 (2016)
Szegedy et al., C.: “Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12–June, no. 3, pp. 1–9 (2015)
Jin, J., Dundar, A., Culurciello, E.: “Flattened Convolutional Neural Networks for Feedforward Acceleration (2014)
Han, S., Mao, H., Dally, W. J.: “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (2015)
Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., Chen, Y.: “Compressing Neural Networks with the Hashing Trick (2015)
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: “Quantized Convolutional Neural Networks for Mobile Devices,” (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 2261–2269 (2017)
Cordts et al., M.: The Cityscapes Dataset for Semantic Urban Scene Understanding (2016)
Shelhamer, E., Long, J., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation (2017)
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation (2015)
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 5168–5177 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network (2016)
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation, Lecture Notes of Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11211 LNCS, pp. 833–851 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks, Lecture Notes of Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 630–645 (2016)
Liu, Z., Li, X., Luo, P., Loy, C. C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of IEEE International Conference Computer Vision, vol. 2015 Inter, pp. 1377–1385 (2015)
Zheng et al., S.: Conditional random fields as recurrent neural networks. arXiv:1502.03240\([cs]\) (2015)
Teichmann, M. T. T., Cipolla, R.: “Convolutional CRFs for Semantic Segmentation,” (2018)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11207 LNCS, pp. 418–434 (2018)
Romera, E., Alvarez, J. M., Bergasa, L. M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. In: Tits, pp. 1–10 (2018)
Shelhamer, E., Rakelly, K., Hoffman, J., Darrell, T.: Clockwork convnets for video semantic segmentation, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9915 LNCS, pp. 852–868 (2016)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11214 LNCS, pp. 561–580 (2018)
Li, X., Liu, Z., Luo, P., Loy, C. C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, no. Mc, pp. 6459–6468 (2017)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Wang, X., Yu, F., Dou, Z. Y., Darrell, T., Gonzalez, J. E.: SkipNet: learning dynamic routing in convolutional networks, Lecture Notes Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11217 LNCS, pp. 420–436 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Computer Society Conference Computer Vision Pattern Recognition, vol. 2016, pp. 770–778 (2016)
Howard et al., A.G.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks (2012)
Zhang, X., Zhou, X., Lin, M.: ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision (2015)
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings—30th IEEE Conference Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 636–644 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks (2018)
Vallurupalli, N., Annamaneni, S., Varma, G., Jawahar, C. V., Mathew, M., Nagori, S.: Efficient Semantic Segmentation using Gradual Grouping, pp. 711–719 (2018)
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions (2015)
Poudel, R. P. K., Bonde, U., Liwicki, S., Zach, C.: ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time, pp. 1–11 (2018)
Lo, S.-Y., Hang, H.-M., Chan, S.-W., Lin, J.-J.: Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
El Houfi, S., Majda, A. Efficient use of recent progresses for Real-time Semantic segmentation. Machine Vision and Applications 31, 45 (2020). https://doi.org/10.1007/s00138-020-01095-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-020-01095-0