Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation

He, Jiankai; Jiang, Bin; Yang, Chao; Tu, Wenxuan

doi:10.1007/978-3-030-60633-6_11

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12305))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2636 Accesses

Abstract

Though current semantic segmentation methods achieve high accuracy, most of them suffer from low speed, massive memory usage, and high computation complexity. To avoid these problems, we propose a light-weight network called Hybrid Dilated Convolution Network (HDCNet). HDCNet mainly consists of the Hybrid Scale-Aligned Block (HSAB) and the Attentive Depthwise Separable Block (ADSB). The HSAB adopts multiple small kernel convolutions with small-scale dilation rates to extract local information and applies several large kernel convolutions with large-scale dilation rates to encode global information, respectively. We further explore the best option to match the kernel size to the dilation scale. The ADSB is designed to decrease redundant parameters and enhance the critical information by depthwise separable convolution and mixed convolution kernels. In this way, ADSB and HSAB jointly encode multi-scale context information to improve model performance. Thereafter, we combine integrated local information with global information to generate final prediction results. Extensive experiments on Cityscape dataset have demonstrated that the proposed method reaches a better trade-off between accuracy and efficiency compared with other start-of-the-art methods. In particular, HDCNet obtains 72.82% MIoU with only 2.02M and 16.8 GFLOPs.

This work is partially supported by the National Natural Science Foundation of China under Grant No. 61702176.

The first author is a graduate student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hong, W.X., Wang, Z.Z., Yang, M., Yuan, J.S.: Conditional generative adversarial network for structured domain adaptation. In: CVPR, pp. 1335–1344 (2018)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40, 834–848 (2018)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)
Google Scholar
Sandler, M., Howard, A.G., Zhu M.L., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Google Scholar
Mehta, S., Rastegari, M., Shapiro, L.G., Hajishirzi, H.: ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. In: CVPR, pp. 9190–9200 (2019)
Google Scholar
Li, H.C., Xiong, P.F., Fan, H.Q., Sun, J: Deep feature aggregation for real-time semantic segmentation. In: CVPR, pp. 9522–9531 (2019)
Google Scholar
Ding, H.H., Jiang, X.D., Shuai, B., Liu, A.Q., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: CVPR, pp. 2393–2402 (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2017)
Google Scholar
Wang, X.L., Girshick, R.B., Gupta, A., He, K.M.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Google Scholar
Chen, Y.P., Kalantidis, Y., Li, J.S., Yan, S.C., Feng, J.S.: A2-Nets: double attention networks. In: NeurIPS, pp. 350–359 (2018)
Google Scholar
Fu, J., Liu, J., Tian, H.J., Li, Y., Bao, Y.J., Fang, Z.W., Lu, H.Q.: Dual attention network for scene segmentation. In: CVPR, pp. 3146–3154 (2019)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)
Google Scholar
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)
Google Scholar
Jiang, B., Tu, W.X., Yang, C., Yuan, J.: Context-integrated and feature-refined network for lightweight object parsing. In: IEEE, pp. 5079–5093 (2020)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016)
Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Article Google Scholar
de Oliveira Jr., L.A., et al.: SegNetRes-CRF: a deep convolutional encoder-decoder architecture for semantic image segmentation. In: IJCNN, pp. 1–6 (2018)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. In: arXiv preprint arXiv:1606.02147 (2016)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19, 263–272 (2018)
Google Scholar
Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: ICNet for real-time semantic segmentation on high-resolution images. In: arXiv preprint arXiv:1704.08545v2 (2018)
Siam, M., Gamal, M., Abdel-Razek, M., Yogamani, S.K., Jägersand, M.: RTSeg: real-time semantic segmentation comparative study. In: ICIP, pp. 1603–1607 (2018)
Google Scholar
Yang, M.K., Yu, K., Zhang, C., Li, Z.W., Yang, K.Y.: DenseASPP for semantic segmentation in street scenes. In: CVPR, pp. 3684–3692 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
Jiankai He, Bin Jiang, Chao Yang & Wenxuan Tu

Authors

Jiankai He
View author publications
You can also search for this author in PubMed Google Scholar
Bin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxuan Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Jiang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Dalian University of Technology, Dalian, China
Huchuan Lu
Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Chinese Academy of Sciences, Beijing, China
Chenglin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Peking University, Beijing, China
Hongbin Zha
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Jiang, B., Yang, C., Tu, W. (2020). Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation. In: Peng, Y., et al. Pattern Recognition and Computer Vision. PRCV 2020. Lecture Notes in Computer Science(), vol 12305. Springer, Cham. https://doi.org/10.1007/978-3-030-60633-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-60633-6_11
Published: 11 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60632-9
Online ISBN: 978-3-030-60633-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics