Abstract
High-precision semantic segmentation methods require global information and more detailed local features. It is difficult for ordinary convolutional neural networks to efficiently use this information. In response to the above issues, this paper uses the attention to scale method and proposes a novel attention model for semantic segmentation, which aggregates multi-scale and context features to refine prediction. Specifically, the skeleton convolutional neural network framework takes in multiple different scales inputs, by which means the CNN can get representations in different scales. The proposed attention model will handle the features from different scale streams respectively and integrate them. Then location attention branch of the model learns to softly weight the multi-scale features at each pixel location. Moreover, we add an recalibrating branch, parallel to where location attention comes out, to recalibrate the score map per class. We achieve quite competitive results on PASCAL VOC 2012 and ADE20K datasets, which surpass baseline and related works.
Similar content being viewed by others
References
Wang F, Jiang M, Qian C, et al. (2017) Residual attention network for image classification[J]. arXiv preprint https://arxiv.org/abs/1704.06904
Zheng H, Fu J, Mei T, et al. (2017) Learning multi-attention convolutional neural network for fine-grained image recognition[C]. In Int. Conf. on Computer Vision. Venice, Italy, 6
Sun M, Yuan Y, Zhou F, et al. (2018) Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition[J]. arXiv preprint https://arxiv.org/abs/1806.05372
Chen X, Xu C, Yang X, et al. (2018) Attention-GAN for Object Transfiguration in Wild Images[J]. arXiv preprint https://arxiv.org/abs/1803.06798
Chen, Liang-Chieh, et al. (2016) "Attention to scale: Scale-aware semantic image segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, Hang, et al. (2018) "Context encoding for semantic segmentation." IEEE conference on Computer Vision and Pattern Recognition. June 18–23, 2018, Salt Lake City, USA
Li, Hanchao, et al. (2018) "Pyramid attention network for semantic segmentation." arXiv preprint https://arxiv.org/abs/1805.10180
Fu, Jun, et al. (2019) "Dual attention network for scene segmentation." IEEE Conference on Computer Vision and Pattern Recognition. June 15–20, 2019, Long Beach, USA
Liu, Yifu, et al. (2020) "Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation." Neural Processing Letters: 1–19
Long, J., Shelhamer, E., Darrell, T. (2015): Fully convolutional networks for semantic segmentation. IEEE conference on computer vision and pattern recognition, Proceedings:3431–3440, June 7–12, 2015, Boston, USA.
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Noh, H., Hong, S., Han, B. (2015): Learning deconvolution network for semantic segmentation. IEEE International Conference on Computer Vision, Proceedings: 1520–1528, June 7–12 2015, Boston, USA
Chen, Liang-Chieh, et al. (2017) "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834–848
Yu, F., Koltun, V. (2015): Multi-scale context aggregation by dilated convolutions. arXiv preprint https://arxiv.org/abs/1511.07122
Liu, W., Rabinovich, A., Berg, A.C. (2015): Parsenet: Looking wider to see better. arXiv preprint https://arxiv.org/abs/1506.04579
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings:2881–2890, July 21–26, 2017, Honolulu, USA
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio (2014) "Neural machine translation by jointly learning to align and translate." arXiv preprint https://arxiv.org/abs/1409.0473
Chen, Jingyuan, et al. (2017) "Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention." International ACM SIGIR conference on Research and Development in Information Retrieval. Aug 7–11, Shinjuku, Japan
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X. (2017): Residual attention network for image classification. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings: 3156–3164, July 21–26, Honolulu, USA
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y. (2015): Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning, Proceedings: 2048–2057, July 6–11, Lille, France,
Song, X., Feng, F., Han, X., Yang, X., Liu, W., Nie, L. (2018): Neural compatibility modeling with attentive knowledge distillation. arXiv preprint https://arxiv.org/abs/1805.00313
Hariharan, B., Arbel´aez, P., Girshick, R., Malik, J. (2015): Hypercolumns for object segmentation and fine-grained localization. IEEE conference on computer vision and pattern recognition, Proceedings:447–456, June 7–12, Boston, USA
Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., Huang, T.S. (2018): Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition, Proceedings: 7268–7277, June 18–23, Salt Lake City, USA
Li, X., Jie, Z., Wang, W., Liu, C., Yang, J., Shen, X., Lin, Z., Chen, Q., Yan, S., Feng, J.: Foveanet (2017): Perspective-aware urban scene parsing. IEEE International Conference on Computer Vision, Proceedings:784–792, Oct 22–29, 2017, Venice, Italy
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A. (2016): Semantic understanding of scenes through the ade20k dataset. arXiv preprint https://arxiv.org/abs/1608.05442
Hariharan, B., Arbel´aez, P., Bourdev, L., Maji, S., Malik, J. (2011): Semantic contours from inverse detectors. Computer Vision (ICCV), 2011 IEEE International Conference on, Proceedings:991–998, Nov 6–13, Barcelona, Spain
Chen, Liang-Chieh, et al. (2014)"Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint https://arxiv.org/abs/1412.7062
Simonyan, K., Zisserman, A. (2014): Very deep convolutional networks for large-scale image recognition. arXiv preprint https://arxiv.org/abs/1409.1556
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu, J., Tan, M., Zhang, H., Tao, D., & Rui, Y. (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence
Acknowledgements
The work of paper was supported by National Natural Science Foundation of China(No. 61672244), Hubei Province Natural Science Foundation of China(No.2019CFB526).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peng G a ng , PhD, Assoc. Prof, IEEE member , Yang Shiqi (Co First Author), Master; Wang Hao (Corresponding Master graduate student).
Rights and permissions
About this article
Cite this article
Peng, G., Yang, S. & Wang, H. Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model. Neural Process Lett 53, 4177–4188 (2021). https://doi.org/10.1007/s11063-021-10587-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10587-7