Skip to main content
Log in

ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semantic segmentation can be considered as a per-pixel localization and classification problem, which gives a meaningful label to each pixel in an input image. Deep convolutional neural networks have made extremely successful in semantic segmentation in recent years. However, some challenges still exist. The first challenge task is that most current networks are complex and it is hard to deploy these models on mobile devices because of the limitation of computational cost and memory. Getting more contextual information from downsampled feature maps is another challenging task. To this end, we propose an asymmetric depthwise separable convolution network (ADSCNet) which is a lightweight neural network for real-time semantic segmentation. To facilitating information propagation, Dense Dilated Convolution Connections (DDCC), which connects a set of dilated convolutional layers in a dense way, is introduced in the network. Pooling operation is inserted before ADSCNet unit to cover more contextual information in prediction. Extensive experimental results validate the superior performance of our proposed method compared with other network architectures. Our approach achieves mean intersection over union (mIOU) of 67.5% on Cityscapes dataset at 76.9 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  2. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40 (4):834–848

    Article  Google Scholar 

  3. Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  4. Chen L.-C., Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

    Chapter  Google Scholar 

  5. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  7. He Y, Han S (2018) Adc: Automated deep compression and acceleration with reinforcement learning. arXiv:1802.03494

  8. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  9. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360

  10. Ioannou Y, Robertson D, Shotton J, Cipolla R, Criminisi A (2015) Training cnns with low-rank filters for efficient image classification. arXiv:1511.06744

  11. Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2019) Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Transactions on Intelligent Transportation Systems

  12. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning (ICML), pp 448–456

  13. Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:1405.38661405.3866

  14. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  15. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710

  16. Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2755–2763

  17. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  18. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147

  19. Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters – improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751

  20. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  21. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv:1801.04381

  22. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  23. Wang P, Hu Q, Zhang Y, Zhang C, Liu Y, Cheng J (2018) Two-step quantization for low-bit neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4376–4384

  24. Xie G, Wang J, Zhang T, Lai J, Hong R, Qi GJ (2018) Interleaved structured sparse convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 8847–8856

  25. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995

  26. Yoon J, Hwang SJ (2017) Combined group and exclusive sparsity for deep neural networks. In: Proceedings of the international conference on machine learning (ICML), pp 3958–3966

  27. Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, Tang Y (2018) Methods and datasets on semantic segmentation: a review. Neurocomputing 304:82–103

    Article  Google Scholar 

  28. Yu X, Yu Z, Ramalingam S (2018) Learning strict identity mappings in deep residual networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4432–4440

  29. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. Proc IEEE Conf Comput Vis Pattern Recognit, 6848–6856

  30. Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955

    Article  Google Scholar 

  31. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890

  32. Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2016) Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sens 55(2):645–657

    Article  Google Scholar 

  33. Everingham M, Eslami A, Van Gool L, Williams K, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  34. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164

  35. Alhaija A, Mustikovela K, Mescheder L, Geiger A, Rother C (2018) Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis 126(9):961–972

    Article  Google Scholar 

  36. Xie D, Deng C, Wang H, Li C, Tao D (2018) Semantic adversarial network with multi-scale pyramid attention for video classification. Association for the Advancement of Artificial Intelligence (AAAI)

  37. Deng C, Yang E, Liu T, Liu W, Li J, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044

    Article  MathSciNet  Google Scholar 

  38. Li N, Li C, Deng C, Liu X, Gao X (2018) Deep joint semantic-embedding hashing. Int Joint Conf Artif Intell, 2397–2403

  39. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neur Comput 29(9):2352–2449

    Article  MathSciNet  Google Scholar 

  40. Cai Z, Fan Q, Feris R, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 354–370

    Chapter  Google Scholar 

  41. Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536

    Article  Google Scholar 

  42. Liu C, Chen L, Schroff F, Adam H, Hua W, Yuille A, Fei-Fei L (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 82–92

  43. Bischke B, Helber P, Folz J, Borth D, Dengel A (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 1480–1484

  44. Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1150–1157

  45. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893

  46. Li J, Allinson N (2008) A comprehensive review of current local features for computer vision. Neurocomputing 71(10):1771–1787

    Article  Google Scholar 

  47. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  48. Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  49. Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3376–3385

  50. Vezhnevets A, Ferrari V, Buhmann J (2012) Weakly supervised structured output learning for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–852

  51. Papandreou G, Chen L, Murphy K, Yuille A (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International conference on computer vision (ICCV), pp 1742–1750

  52. Liu S, Yan S, Zhang T, Xu C, Liu J, Lu H (2011) Weakly supervised graph propagation towards collective image parsing. IEEE Trans Multimed 14(2):361–373

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities of Central South University under grant 2017zzts730. We appreciate Xiangyu Zhang for helping on the discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyun Xiong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Xiong, H., Wang, H. et al. ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50, 1045–1056 (2020). https://doi.org/10.1007/s10489-019-01587-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01587-1

Keywords

Navigation