Skip to main content
Log in

Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

High-precision semantic segmentation methods require global information and more detailed local features. It is difficult for ordinary convolutional neural networks to efficiently use this information. In response to the above issues, this paper uses the attention to scale method and proposes a novel attention model for semantic segmentation, which aggregates multi-scale and context features to refine prediction. Specifically, the skeleton convolutional neural network framework takes in multiple different scales inputs, by which means the CNN can get representations in different scales. The proposed attention model will handle the features from different scale streams respectively and integrate them. Then location attention branch of the model learns to softly weight the multi-scale features at each pixel location. Moreover, we add an recalibrating branch, parallel to where location attention comes out, to recalibrate the score map per class. We achieve quite competitive results on PASCAL VOC 2012 and ADE20K datasets, which surpass baseline and related works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Wang F, Jiang M, Qian C, et al. (2017) Residual attention network for image classification[J]. arXiv preprint https://arxiv.org/abs/1704.06904

  2. Zheng H, Fu J, Mei T, et al. (2017) Learning multi-attention convolutional neural network for fine-grained image recognition[C]. In Int. Conf. on Computer Vision. Venice, Italy, 6

  3. Sun M, Yuan Y, Zhou F, et al. (2018) Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition[J]. arXiv preprint https://arxiv.org/abs/1806.05372

  4. Chen X, Xu C, Yang X, et al. (2018) Attention-GAN for Object Transfiguration in Wild Images[J]. arXiv preprint https://arxiv.org/abs/1803.06798

  5. Chen, Liang-Chieh, et al. (2016) "Attention to scale: Scale-aware semantic image segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition

  6. Zhang, Hang, et al. (2018) "Context encoding for semantic segmentation." IEEE conference on Computer Vision and Pattern Recognition. June 18–23, 2018, Salt Lake City, USA

  7. Li, Hanchao, et al. (2018) "Pyramid attention network for semantic segmentation." arXiv preprint https://arxiv.org/abs/1805.10180

  8. Fu, Jun, et al. (2019) "Dual attention network for scene segmentation." IEEE Conference on Computer Vision and Pattern Recognition. June 15–20, 2019, Long Beach, USA

  9. Liu, Yifu, et al. (2020) "Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation." Neural Processing Letters: 1–19

  10. Long, J., Shelhamer, E., Darrell, T. (2015): Fully convolutional networks for semantic segmentation. IEEE conference on computer vision and pattern recognition, Proceedings:3431–3440, June 7–12, 2015, Boston, USA.

  11. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  12. Noh, H., Hong, S., Han, B. (2015): Learning deconvolution network for semantic segmentation. IEEE International Conference on Computer Vision, Proceedings: 1520–1528, June 7–12 2015, Boston, USA

  13. Chen, Liang-Chieh, et al. (2017) "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834–848

  14. Yu, F., Koltun, V. (2015): Multi-scale context aggregation by dilated convolutions. arXiv preprint https://arxiv.org/abs/1511.07122

  15. Liu, W., Rabinovich, A., Berg, A.C. (2015): Parsenet: Looking wider to see better. arXiv preprint https://arxiv.org/abs/1506.04579

  16. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings:2881–2890, July 21–26, 2017, Honolulu, USA

  17. Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio (2014) "Neural machine translation by jointly learning to align and translate." arXiv preprint https://arxiv.org/abs/1409.0473

  18. Chen, Jingyuan, et al. (2017) "Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention." International ACM SIGIR conference on Research and Development in Information Retrieval. Aug 7–11, Shinjuku, Japan

  19. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X. (2017): Residual attention network for image classification. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings: 3156–3164, July 21–26, Honolulu, USA

  20. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y. (2015): Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning, Proceedings: 2048–2057, July 6–11, Lille, France,

  21. Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L. (2018): Neural compatibility modeling with attentive knowledge distillation. arXiv preprint https://arxiv.org/abs/1805.00313

  22. Hariharan, B., Arbel´aez, P., Girshick, R., Malik, J. (2015): Hypercolumns for object segmentation and fine-grained localization. IEEE conference on computer vision and pattern recognition, Proceedings:447–456, June 7–12, Boston, USA

  23. Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., Huang, T.S. (2018): Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings: 7268–7277, June 18–23, Salt Lake City, USA

  24. Li, X., Jie, Z., Wang, W., Liu, C., Yang, J., Shen, X., Lin, Z., Chen, Q., Yan, S., Feng, J.: Foveanet (2017): Perspective-aware urban scene parsing. IEEE International Conference on Computer Vision, Proceedings:784–792, Oct 22–29, 2017, Venice, Italy

  25. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html

  26. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A. (2016): Semantic understanding of scenes through the ade20k dataset. arXiv preprint https://arxiv.org/abs/1608.05442

  27. Hariharan, B., Arbel´aez, P., Bourdev, L., Maji, S., Malik, J. (2011): Semantic contours from inverse detectors. Computer Vision (ICCV), 2011 IEEE International Conference on, Proceedings:991–998, Nov 6–13, Barcelona, Spain

  28. Chen, Liang-Chieh, et al. (2014)"Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint https://arxiv.org/abs/1412.7062

  29. Simonyan, K., Zisserman, A. (2014): Very deep convolutional networks for large-scale image recognition. arXiv preprint https://arxiv.org/abs/1409.1556

  30. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032

    Article  MathSciNet  Google Scholar 

  31. Yu, J., Tan, M., Zhang, H., Tao, D., & Rui, Y. (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence

Download references

Acknowledgements

The work of paper was supported by National Natural Science Foundation of China(No. 61672244), Hubei Province Natural Science Foundation of China(No.2019CFB526).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peng G a ng , PhD, Assoc. Prof, IEEE member , Yang Shiqi (Co First Author), Master; Wang Hao (Corresponding Master graduate student).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, G., Yang, S. & Wang, H. Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model. Neural Process Lett 53, 4177–4188 (2021). https://doi.org/10.1007/s11063-021-10587-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10587-7

Keywords

Navigation