Abstract
One of the great challenges for image semantic segmentation is the loss of object details caused by the extensive use of convolution and pooling operations, such as blurred edges and lines, ignoring small objects, etc. To address these problems, we propose the sensitive feature selection module (SFSM), which learns the distribution characteristics of each pixel on different channels at the same location by utilizing the feature maps from prior convolution layers. Then, the obtained weights are used to reweight each pixel on different channels, so that the object boundaries and small objects can be better focused by the network in the feature extraction process. At last, the information obtained by SFSM is combined with the original features to further improve the feature representation and help to obtain more accurate segmentation results. Experimental results show that our SFSM algorithm can improve the performance of semantic segmentation networks. By integrating our SFSM into FCN and DeepLabv3, we can get 0.21% and 0.6% accuracy improvement on the PASCAL VOC 2012 dataset respectively. For the Cityscapes dataset, although the segmentation task is relatively complicated, our improved networks still achieve excellent performance. To further verify our module is not restricted to specific networks or datasets, we embed it into DoubleU-Net to do medical image segmentation task on dataset ISIC-2018 and get 0.16% accuracy improvement.
Similar content being viewed by others
References
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) (2018) encoder-decoder with atrous separable convolution for semantic image segmentation. Proceed European Conf Comput Vision (ECCV):801–818
Codella NCF, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza SW, Kalloo A, Liopyris K, Mishra N, Kittler H, Halpern A (2018) Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). 2018 IEEE 15th international symposium on biomedical imaging, 2018: 168-172
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3213–3223
Ding H, Jiang X, Shuai B, Liu AQ, Wang G (2019) semantic correlation promoted shape-variant context for segmentation. Proceed IEEE/CVF Conf Comp Vision Patt Recogn:8885–8894
Ding H, Jiang X, Liu AQ, Thalmann NM (2019) Wang G (2019) boundary-aware feature propagation for scene segmentation. Proceed IEEE/CVF Int Conf Comp Vision:6819–6829
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13733–13742.
Ess A, Mueller T, Grabner H, Gool LJV (2009) Segmentation-based urban traffic scene understanding. In: 2009 20th British machine vision conference (BMVC). 84.1-84.11
Everingham M, Van GL, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Everingham M, Eslami S, Gool LV, Williams C, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vision (IJCV) 111(1):98–136
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The Kitti vision benchmark suite. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3354–3361
Guo MH, Liu ZN, Mu TJ, Hu SM (2021) Beyond self-attention: external attention using two linear layers for visual tasks. arXiv:2105.02358
Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: proceedings of the IEEE international conference on computer vision (ICCV). 991–998
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 7519–7528.
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) adaptive pyramid context network for semantic. Proceed IEEE/CVF Conf Comput Vision Pattern Recognition:7519–7528
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: proceedings of the European conference computer vision (ECCV). 630–645
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hong Y, Pan H, Sun W, Jia Y (2021) Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv:2101.06085.
Hu J, Shen L, Sun G, Albanie S, Wu E (2018) Squeeze-and-excitation networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141.
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. Advan Neural Inform Process Syst (NIPS) 28:2017–2025
Jha D, Smedsrud PH, Riegler MA, Johansen D, Lange TD, Halvorsen P, Johansen HD (2019) Resunet++: an advanced architecture for medical image segmentation. 2019 IEEE international symposium on multimedia (ISM). IEEE 2019:225–2255
Jha D, Riegler MA, Johansen D, Halvorsen P, Johansen HD (2020) Doubleu-net: a deep convolutional neural network for medical image segmentation. 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS). IEEE 2020:558–564
Jia S, Zhang Y (2018) Saliency-based deep convolutional neural network for no-reference image quality assessmen. Multimed Tools Appl 77(12):14859–14872
Ke TW, Hwang JJ, Liu Z, Yu SX (2018) Adaptive affinity fields for semantic segmentation. Proceed Eur Conf Comput Vision (ECCV):587–602
Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016) Multimodal residual learning for visual QA. In: advances in neural information processing systems (NIPS). 361–369
Larochelle H, Hinton GE (2010) Learning to combine foveal glimpses with a third-order boltzmann machine. In: advances in neural information processing systems (NIPS), pp 1243–1251
Li X, Liu Z, Luo P, Loy CC, Tang X (2017) Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6459–6468
Li Z, Yang S, Song G, Cai L (2021) Hamnet: conformation-guided molecular representation with hamiltonian neural networks. arXiv:2105.03688.
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1925–1934
Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, Li LL (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. Proceed IEEE/CVF Conf Comput Vision Patt Recogn:82–92
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3431–3440
Maaz M, Shaker A, Cholakkal H, Khan S, Zamir SW, Anwer RM, Khan FS (2022) EdgeNeXt: efficiently amalgamated CNN-transformer architecture for Mobile vision applications. arXiv:2206.10589.
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: proceedings of the IEEE international conference on computer vision (ICCV), pp 1520–1528
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. arXiv:1502.06807
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, Glocker B, Rueckert D (2018) Attention u-net: Learning where to look for the pancreas. arXiv:1804.03999
Peng C, Zhang X, Yu G, Luo G (2017) Sun J (2017) large kernel matters--improve semantic segmentation by global convolutional network. Proc IEEE Conf Comput Vis Pattern Recognit:4353–4361
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters-improve semantic segmentation by global convolutional network. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4151–4160
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention, In, pp 234–241
Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross HM (2021) efficient rgb-d semantic segmentation for indoor scene analysis. IEEE Int Conf Robot Automation (ICRA) 2021:13525–13531
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. Adv Neural Inf Proces Syst
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1–9.
Takahashi N, Mitsufuji Y (2021) Densely connected multi-dilated convolutional networks for dense prediction tasks. Proceed IEEE/CVF Conf Comput Vision Pattern Recogn:993–1002
Valada A, Mohan R, Burgard W (2020) Self-supervised model adaptation for multimodal semantic segmentation. Int J Comput Vis 128(5):1239–1285
Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville A (2016) ReSeg: a recurrent neural network-based model for semantic segmentation. In: proceedings of the IEEE international conference on computer vision (ICCV) workshops. 41–48.
Wan J, Wang D, Hoi SCH, Wu P, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: proceedings of the 22nd ACM international conference on multimedia (ACM). 157–166.
Wang F, Jiang M, Chen Q, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3156–3164
Wang P, Chen P, Yuan Y, Ding L, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 1451–1460
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 7794–7803
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Proces Syst 34:12077–12090
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 5987–5995
Yan H, Zhang C, Wu M (2022) Lawin transformer: improving semantic segmentation transformer with multi-scale representations via large window attention. arXiv:2201.01615
Yang M, Yu K, Chi Z, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3684–3692
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. Proc IEEE Conf Comput Vis Pattern Recognit:3684–3692
Yoon Y, Jeon HG, Yoo D, Lee JY, Kweon IS (2015) Learning a deep convolutional network for light-field image super-resolution. In: proceedings of the IEEE international conference on computer vision (ICCV) workshops. 24–32
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018, 2018) learning a discriminative feature network for semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit:1857–1866
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) learning a discriminative feature network for semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit:1857–1866
Yu C, Wang J, Gao C, Yu G, Shen C, Sang N (2020) context prior for scene segmentation. Proceed IEEE/CVF Conf Comput Vision Pattern Recogn:12416–12425
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7151–7160
Zhang X, Xu H, Mo H, Tan J, Yang C, Wang L, Ren W (2021) DCNAS: densely connected neural architecture search for semantic image segmentation. Proceed IEEE/CVF Conf Comput Vision Pattern Recogn:13956–13967
Zhang Z, Zhang X, Peng C, Xue X (2018) Sun J (2018) Exfuse: enhancing feature fusion for semantic segmentation. Proceed Eur Conf Comput Vision (ECCV):269–284
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 6230–6239
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham 2018:3–11
Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. Proceed IEEE/CVF Int Conf Comput Vision:593–602
Acknowledgements
This work was financially supported by National Natural Science Foundation of China under Grant 62172184, Science and Technology Development Plan of Jilin Province of China under Grant 20200401077GX and 20200201292JC, Social Science Research of the Education Department of Jilin Province (JJKH20210901SK), Jilin Educational Scientific Research Leading Group (ZD21003), and Humanities and Social Science Foundation of Changchun Normal University (2020[011]).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have declared that no conflict of interests or competing interests exist.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Che, X., Liu, Q. et al. SFSM: sensitive feature selection module for image semantic segmentation. Multimed Tools Appl 82, 13905–13927 (2023). https://doi.org/10.1007/s11042-022-13901-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13901-0