SSA: semantic structure aware inference on CNN networks for weakly pixel-wise dense predictions without cost

Sun, Yanpeng; Li, Zechao

doi:10.1007/s11704-024-3571-9

SSA: semantic structure aware inference on CNN networks for weakly pixel-wise dense predictions without cost

Research Article
Published: 18 November 2024

Volume 19, article number 192702, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yanpeng Sun¹ &
Zechao Li¹

28 Accesses
43 Altmetric
6 Mentions
Explore all metrics

Abstract

The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps (CAMs) to generate pseudo masks as ground-truth. However, existing methods often incorporate trainable modules to expand the immature class activation maps, which can result in significant computational overhead and complicate the training process. In this work, we investigate the semantic structure information concealed within the CNN network, and propose a semantic structure aware inference (SSA) method that utilizes this information to obtain high-quality CAM without any additional training costs. Specifically, the semantic structure modeling module (SSM) is first proposed to generate the class-agnostic semantic correlation representation, where each item denotes the affinity degree between one category of objects and all the others. Then, the immature CAM are refined through a dot product operation that utilizes semantic structure information. Finally, the polished CAMs from different backbone stages are fused as the output. The advantage of SSA lies in its parameter-free nature and the absence of additional training costs, which makes it suitable for various weakly supervised pixel-dense prediction tasks. We conducted extensive experiments on weakly supervised object localization and weakly supervised semantic segmentation, and the results confirm the effectiveness of SSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-produced Guidance for Weakly-Supervised Object Localization

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Cross-supervision-based equilibrated fusion mechanism of local and global attention for semantic segmentation

Article 14 September 2022

References

Cheng Z, Qiao P, Li K, Li S, Wei P, Ji X, Yuan L, Liu C, Chen J. Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 23673–23684
Google Scholar
Cheng T, Wang X, Chen S, Zhang Q, Liu W. BoxTeacher: exploring high-quality pseudo labels for weakly supervised instance segmentation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 3145–3154
Google Scholar
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2921–2929
Google Scholar
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 618–626
Google Scholar
Wang H, Naidu R, Michael J, Kundu S S. SS-CAM: smoothed score-CAM for sharper visual feature localization. 2020, arXiv preprint arXiv: 2006.14255
Google Scholar
Chattopadhay A, Sarkar A, Howlader P, Balasubramanian V N. Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. 2018, 839–847
Chapter Google Scholar
Zeng C, Yan K, Wang Z, Yu Y, Xia S, Zhao N. Abs-CAM: a gradient optimization interpretable approach for explanation of convolutional neural networks. Signal, Image and Video Processing, 2023, 17(4): 1069–1076
Article Google Scholar
Choe J, Shim H. Attention-based dropout layer for weakly supervised object localization. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2214–2223
Google Scholar
Zhang X, Wei Y, Kang G, Yang Y, Huang T. Self-produced guidance for weakly-supervised object localization. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 610–625
Google Scholar
Zhang C, Zhong W, Li C, Deng H. Random walk-based erasing data augmentation for deep learning. Signal, Image and Video Processing, 2023, 17(5): 2447–2454
Article Google Scholar
Zhong Z, Zheng L, Kang G, Li S, Yang Y. Random erasing data augmentation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence. 2020, 13001–13008
Google Scholar
Fu R, Hu Q, Dong X, Guo Y, Gao Y, Li B. Axiom-based grad-cam: Towards accurate visualization and explanation of CNNs. In: Proceedings of the 31st British Machine Vision Conference. 2020
Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Google Scholar
Omeiza D, Speakman S, Cintas C, Weldermariam K. Smooth grad-CAM++: an enhanced inference level visualization technique for deep convolutional neural network models. 2019, arXiv preprint arXiv: 1908.01224
Google Scholar
Zhang Q, Rao L, Yang Y. Group-CAM: group score-weighted visual explanations for deep convolutional networks. 2021, arXiv preprint arXiv: 2103.13859
Google Scholar
Zhang D, Zhang H, Tang J, Hua X S, Sun Q. Causal intervention for weakly-supervised semantic segmentation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 56
Google Scholar
Xie J, Xiang J, Chen J, Hou X, Zhao X, Shen L. C² AM: contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 989–998
Google Scholar
Lee J, Kim E, Yoon S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4071–4080
Google Scholar
Wei Y, Feng J, Liang X, Cheng M M, Zhao Y, Yan S. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6488–6496
Google Scholar
DeVries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552
Google Scholar
Lee J, Kim E, Lee S, Lee J, Yoon S. FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5262–5271
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021
Google Scholar
Ru L, Zhan Y, Yu B, Du B. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16825–16834
Google Scholar
Ru L, Zheng H, Zhan Y, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023, 3093–3102
Google Scholar
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. Emerging properties in self-supervised vision transformers. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 9630–9640
Google Scholar
Gao W, Wan F, Pan X, Peng Z, Tian Q, Han Z, Zhou B, Ye Q. TS-CAM: token semantic coupled attention map for weakly supervised object localization. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 2866–2875
Google Scholar
Xu L, Ouyang W, Bennamoun M, Boussaid F, Xu D. Multi-class token transformer for weakly supervised semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 4300–4309
Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252
Article MathSciNet Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD birds-200–2011 dataset. Technical Report CNS-TR-2011-001. California Institute of Technology, 2011
Google Scholar
Zhang X, Wei Y, Yang Y, Wu F. Rethinking localization map: towards accurate object perception with self-enhancement maps. 2020, arXiv preprint arXiv: 2006.05220
Google Scholar
Pan X, Gao Y, Lin Z, Tang F, Dong W, Yuan H, Huang F, Xu C. Unveiling the potential of structure preserving for weakly supervised object localization. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11637–11646
Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2818–2826
Google Scholar
Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338
Article Google Scholar
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J. Semantic contours from inverse detectors. In: Proceedings of 2011 International Conference on Computer Vision. 2011, 991–998
Chapter Google Scholar
Li Z, Sun Y, Zhang L, Tang J. CTNet: context-based tandem network for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9904–9917
Article Google Scholar
Sun Y, Chen Q, He X, Wang J, Feng H, Han J, Ding E, Cheng J, Li Z, Wang J. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 37484–37496
Google Scholar
Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2204–2213
Google Scholar
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848
Article Google Scholar
Yun S, Han D, Chun S, Oh S J, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022–6031
Google Scholar
Xue H, Liu C, Wan F, Jiao J, Ji X, Ye Q. DANet: divergent activation for weakly supervised object localization. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6588–6597
Google Scholar
Zhang X, Wei Y, Feng J, Yang Y, Huang T. Adversarial complementary learning for weakly supervised object localization. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1325–1334
Chapter Google Scholar
Zhang X, Wei Y, Yang Y. Inter-image communication for weakly supervised localization. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 271–287
Google Scholar
Mai J, Yang M, Luo W. Erasing integrated learning: a simple yet effective approach for weakly supervised object localization. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8763–8772
Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431–3440
Google Scholar
Dai J, He K, Sun J. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 1635–1643
Google Scholar
Khoreva A, Benenson R, Hosang J, Hein M, Schiele B. Simple does it: weakly supervised instance and semantic segmentation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1665–1674
Google Scholar
Sun G, Wang W, Dai J, Van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 347–365
Google Scholar
Jiang P T, Han L H, Hou Q, Cheng M M, Wei Y. Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7062–7077
Article Google Scholar
Li K, Zhang Y, Li K, Li Y, Fu Y. Attention bridging network for knowledge transfer. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5197–5206
Google Scholar
Jiang P T, Hou Q, Cao Y, Cheng M M, Wei Y, Xiong H K. Integral object mining via online attention accumulation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 2070–2079
Google Scholar
Fan J, Zhang Z, Tan T, Song C, Xiao J. CIAN: cross-image affinity net for weakly supervised semantic segmentation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence. 2020, 10762–10769
Google Scholar
Kolesnikov A, Lampert C H. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 695–711
Google Scholar
Shimoda W, Yanai K. Self-supervised difference detection for weakly-supervised semantic segmentation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5207–5216
Google Scholar
Wang Y, Zhang J, Kan M, Shan S, Chen X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 12272–12281
Google Scholar
Chang Y T, Wang Q, Hung W C, Piramuthu R, Tsai Y H, Yang M H. Weakly-supervised semantic segmentation via sub-category exploration. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8988–8997
Google Scholar
Sun K, Shi H, Zhang Z, Huang Y. ECS-Net: improving weakly supervised semantic segmentation by using connections between class activation maps. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 7263–7272
Google Scholar
Li Y, Duan Y, Kuang Z, Chen Y, Zhang W, Li X. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence,34th Conference on Innovative Applications of Artificial Intelligence, The 12th Symposium on Educational Advances in Artificial Intelligence. 2022, 1447–1455
Google Scholar
Jiang P T, Yang Y, Hou Q, Wei Y. L2G: a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16865–16875
Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Key R&D Program of China (2022ZD0118802) and the National Natural Science Foundation of China (Grant Nos. U20B2064 and U21B2043).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210014, China
Yanpeng Sun & Zechao Li

Authors

Yanpeng Sun
View author publications
You can also search for this author inPubMed Google Scholar
Zechao Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zechao Li.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Yanpeng Sun received the MS degree from Guilin University of Electronic Technology, China in 2019. He is currently pursuing the PhD degree with the School of Computer Science and Engineering, Nanjing University of Science and Technology, China. His research interests include deep learning, visual segmentation and understanding.

Zechao Li is currently a professor at Nanjing University of Science and Technology, China. He received his PhD degree from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China in 2013, and his BE degree from University of Science and Technology of China, China in 2008. His research interests include big media analysis, computer vision. He serves as an Associate Editor for IEEE TNNLS.

Electronic supplementary material