Skip to main content

Advertisement

Log in

SSA: semantic structure aware inference on CNN networks for weakly pixel-wise dense predictions without cost

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps (CAMs) to generate pseudo masks as ground-truth. However, existing methods often incorporate trainable modules to expand the immature class activation maps, which can result in significant computational overhead and complicate the training process. In this work, we investigate the semantic structure information concealed within the CNN network, and propose a semantic structure aware inference (SSA) method that utilizes this information to obtain high-quality CAM without any additional training costs. Specifically, the semantic structure modeling module (SSM) is first proposed to generate the class-agnostic semantic correlation representation, where each item denotes the affinity degree between one category of objects and all the others. Then, the immature CAM are refined through a dot product operation that utilizes semantic structure information. Finally, the polished CAMs from different backbone stages are fused as the output. The advantage of SSA lies in its parameter-free nature and the absence of additional training costs, which makes it suitable for various weakly supervised pixel-dense prediction tasks. We conducted extensive experiments on weakly supervised object localization and weakly supervised semantic segmentation, and the results confirm the effectiveness of SSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cheng Z, Qiao P, Li K, Li S, Wei P, Ji X, Yuan L, Liu C, Chen J. Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 23673–23684

    Google Scholar 

  2. Cheng T, Wang X, Chen S, Zhang Q, Liu W. BoxTeacher: exploring high-quality pseudo labels for weakly supervised instance segmentation. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 3145–3154

    Google Scholar 

  3. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2921–2929

    Google Scholar 

  4. Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 618–626

    Google Scholar 

  5. Wang H, Naidu R, Michael J, Kundu S S. SS-CAM: smoothed score-CAM for sharper visual feature localization. 2020, arXiv preprint arXiv: 2006.14255

    Google Scholar 

  6. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian V N. Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. 2018, 839–847

    Chapter  Google Scholar 

  7. Zeng C, Yan K, Wang Z, Yu Y, Xia S, Zhao N. Abs-CAM: a gradient optimization interpretable approach for explanation of convolutional neural networks. Signal, Image and Video Processing, 2023, 17(4): 1069–1076

    Article  Google Scholar 

  8. Choe J, Shim H. Attention-based dropout layer for weakly supervised object localization. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2214–2223

    Google Scholar 

  9. Zhang X, Wei Y, Kang G, Yang Y, Huang T. Self-produced guidance for weakly-supervised object localization. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 610–625

    Google Scholar 

  10. Zhang C, Zhong W, Li C, Deng H. Random walk-based erasing data augmentation for deep learning. Signal, Image and Video Processing, 2023, 17(5): 2447–2454

    Article  Google Scholar 

  11. Zhong Z, Zheng L, Kang G, Li S, Yang Y. Random erasing data augmentation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence. 2020, 13001–13008

    Google Scholar 

  12. Fu R, Hu Q, Dong X, Guo Y, Gao Y, Li B. Axiom-based grad-cam: Towards accurate visualization and explanation of CNNs. In: Proceedings of the 31st British Machine Vision Conference. 2020

    Google Scholar 

  13. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778

    Google Scholar 

  14. Omeiza D, Speakman S, Cintas C, Weldermariam K. Smooth grad-CAM++: an enhanced inference level visualization technique for deep convolutional neural network models. 2019, arXiv preprint arXiv: 1908.01224

    Google Scholar 

  15. Zhang Q, Rao L, Yang Y. Group-CAM: group score-weighted visual explanations for deep convolutional networks. 2021, arXiv preprint arXiv: 2103.13859

    Google Scholar 

  16. Zhang D, Zhang H, Tang J, Hua X S, Sun Q. Causal intervention for weakly-supervised semantic segmentation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 56

    Google Scholar 

  17. Xie J, Xiang J, Chen J, Hou X, Zhao X, Shen L. C2 AM: contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 989–998

    Google Scholar 

  18. Lee J, Kim E, Yoon S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4071–4080

    Google Scholar 

  19. Wei Y, Feng J, Liang X, Cheng M M, Zhao Y, Yan S. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6488–6496

    Google Scholar 

  20. DeVries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552

    Google Scholar 

  21. Lee J, Kim E, Lee S, Lee J, Yoon S. FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5262–5271

    Google Scholar 

  22. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021

    Google Scholar 

  23. Ru L, Zhan Y, Yu B, Du B. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16825–16834

    Google Scholar 

  24. Ru L, Zheng H, Zhan Y, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2023, 3093–3102

    Google Scholar 

  25. Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. Emerging properties in self-supervised vision transformers. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 9630–9640

    Google Scholar 

  26. Gao W, Wan F, Pan X, Peng Z, Tian Q, Han Z, Zhou B, Ye Q. TS-CAM: token semantic coupled attention map for weakly supervised object localization. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 2866–2875

    Google Scholar 

  27. Xu L, Ouyang W, Bennamoun M, Boussaid F, Xu D. Multi-class token transformer for weakly supervised semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 4300–4309

    Google Scholar 

  28. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721

    Google Scholar 

  29. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252

    Article  MathSciNet  Google Scholar 

  30. Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD birds-200–2011 dataset. Technical Report CNS-TR-2011-001. California Institute of Technology, 2011

    Google Scholar 

  31. Zhang X, Wei Y, Yang Y, Wu F. Rethinking localization map: towards accurate object perception with self-enhancement maps. 2020, arXiv preprint arXiv: 2006.05220

    Google Scholar 

  32. Pan X, Gao Y, Lin Z, Tang F, Dong W, Yuan H, Huang F, Xu C. Unveiling the potential of structure preserving for weakly supervised object localization. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11637–11646

    Google Scholar 

  33. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

    Google Scholar 

  34. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2818–2826

    Google Scholar 

  35. Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338

    Article  Google Scholar 

  36. Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J. Semantic contours from inverse detectors. In: Proceedings of 2011 International Conference on Computer Vision. 2011, 991–998

    Chapter  Google Scholar 

  37. Li Z, Sun Y, Zhang L, Tang J. CTNet: context-based tandem network for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9904–9917

    Article  Google Scholar 

  38. Sun Y, Chen Q, He X, Wang J, Feng H, Han J, Ding E, Cheng J, Li Z, Wang J. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 37484–37496

    Google Scholar 

  39. Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2204–2213

    Google Scholar 

  40. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848

    Article  Google Scholar 

  41. Yun S, Han D, Chun S, Oh S J, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022–6031

    Google Scholar 

  42. Xue H, Liu C, Wan F, Jiao J, Ji X, Ye Q. DANet: divergent activation for weakly supervised object localization. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6588–6597

    Google Scholar 

  43. Zhang X, Wei Y, Feng J, Yang Y, Huang T. Adversarial complementary learning for weakly supervised object localization. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1325–1334

    Chapter  Google Scholar 

  44. Zhang X, Wei Y, Yang Y. Inter-image communication for weakly supervised localization. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 271–287

    Google Scholar 

  45. Mai J, Yang M, Luo W. Erasing integrated learning: a simple yet effective approach for weakly supervised object localization. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8763–8772

    Google Scholar 

  46. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431–3440

    Google Scholar 

  47. Dai J, He K, Sun J. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 1635–1643

    Google Scholar 

  48. Khoreva A, Benenson R, Hosang J, Hein M, Schiele B. Simple does it: weakly supervised instance and semantic segmentation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1665–1674

    Google Scholar 

  49. Sun G, Wang W, Dai J, Van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 347–365

    Google Scholar 

  50. Jiang P T, Han L H, Hou Q, Cheng M M, Wei Y. Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7062–7077

    Article  Google Scholar 

  51. Li K, Zhang Y, Li K, Li Y, Fu Y. Attention bridging network for knowledge transfer. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5197–5206

    Google Scholar 

  52. Jiang P T, Hou Q, Cao Y, Cheng M M, Wei Y, Xiong H K. Integral object mining via online attention accumulation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 2070–2079

    Google Scholar 

  53. Fan J, Zhang Z, Tan T, Song C, Xiao J. CIAN: cross-image affinity net for weakly supervised semantic segmentation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence. 2020, 10762–10769

    Google Scholar 

  54. Kolesnikov A, Lampert C H. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 695–711

    Google Scholar 

  55. Shimoda W, Yanai K. Self-supervised difference detection for weakly-supervised semantic segmentation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5207–5216

    Google Scholar 

  56. Wang Y, Zhang J, Kan M, Shan S, Chen X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 12272–12281

    Google Scholar 

  57. Chang Y T, Wang Q, Hung W C, Piramuthu R, Tsai Y H, Yang M H. Weakly-supervised semantic segmentation via sub-category exploration. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 8988–8997

    Google Scholar 

  58. Sun K, Shi H, Zhang Z, Huang Y. ECS-Net: improving weakly supervised semantic segmentation by using connections between class activation maps. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 7263–7272

    Google Scholar 

  59. Li Y, Duan Y, Kuang Z, Chen Y, Zhang W, Li X. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence,34th Conference on Innovative Applications of Artificial Intelligence, The 12th Symposium on Educational Advances in Artificial Intelligence. 2022, 1447–1455

    Google Scholar 

  60. Jiang P T, Yang Y, Hou Q, Wei Y. L2G: a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16865–16875

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Key R&D Program of China (2022ZD0118802) and the National Natural Science Foundation of China (Grant Nos. U20B2064 and U21B2043).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zechao Li.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Yanpeng Sun received the MS degree from Guilin University of Electronic Technology, China in 2019. He is currently pursuing the PhD degree with the School of Computer Science and Engineering, Nanjing University of Science and Technology, China. His research interests include deep learning, visual segmentation and understanding.

Zechao Li is currently a professor at Nanjing University of Science and Technology, China. He received his PhD degree from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China in 2013, and his BE degree from University of Science and Technology of China, China in 2008. His research interests include big media analysis, computer vision. He serves as an Associate Editor for IEEE TNNLS.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Li, Z. SSA: semantic structure aware inference on CNN networks for weakly pixel-wise dense predictions without cost. Front. Comput. Sci. 19, 192702 (2025). https://doi.org/10.1007/s11704-024-3571-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3571-9

Keywords