research-article

Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation

Authors:

Lianggangxu Chen,

Gaoqi HeAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 2888 - 2897

https://doi.org/10.1145/3581783.3612210

Published: 27 October 2023 Publication History

Abstract

A scene graph generation task is largely restricted under a class imbalance. Previous methods have alleviated the class imbalance problem by incorporating commonsense information into the classification, enabling the prediction model to rectify the incorrect head class into the correct tail class. However, the results of commonsense-based models are typically overcorrected, e.g., the visually correct head class is forcibly modified into the wrong tail class. We argue that there are two principal reasons for this phenomenon. First, existing models ignore the semantic gap between commonsense knowledge and real scenes. Second, current commonsense fusion strategies propagate the neighbors in the visual-linguistic contexts without long-range correlation. To alleviate overcorrection, we formulate the commonsense-based scene graph generation task as two sub-problems: scene-induced commonsense graph generation (SI-CGG) and commonsense-inspired scene graph generation (CI-SGG). In SI-CGG module, unlike conventional methods using fixed commonsense graph, we adaptively adjust the node embeddings in a commonsense graph according to their visual appearance and configure the new reasoning edge under a specific visual context. The CI-SGG module is proposed to propagate the information from scene-induced commonsense graph back to the scene graph. It updates the representations of each node in scene graph by the aggregation of neighbourhood information at different scales. Through maximum likelihood optimisation of the logarithmic Gaussian process, the scene graph automatically adapt to the different neighbors in the visual-linguistic contexts. Systematic experiments on the Visual Genome dataset show that our full method achieves state-of-the-art performance.

Supplemental Material

MP4 File

Presentation video

Download
157.02 MB

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. stat, Vol. 1050 (2016), 21.

[2]

Muhammet Balcilar, Renton Guillaume, Pierre Héroux, Benoit Gaüzère, Sébastien Adam, and Paul Honeine. 2021. Analyzing the expressive power of graph neural networks in a spectral perspective. In Proceedings of the International Conference on Learning Representations (ICLR).

[3]

Hedi Ben-Younes, Rémi Cadene, Matthieu Cord, and Nicolas Thome. 2017. Mutan: Multimodal tucker fusion for visual question answering. In Proceedings of the IEEE international conference on computer vision. 2612--2620.

[4]

Bashirul Azam Biswas and Qiang Ji. 2023. Probabilistic Debiasing of Scene Graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10429--10438.

[5]

Chao Chen, Yibing Zhan, Baosheng Yu, Liu Liu, Yong Luo, and Bo Du. 2022. Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation. in Proceedings of the AAAI Conference on Artificial Intelligence (2022).

[6]

Tianshui Chen, Weihao Yu, Riquan Chen, and Liang Lin. 2019 Knowledge-embedded routing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6163--6171.

[7]

Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng, and Liqiang Nie. 2022. Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19427--19436.

[8]

Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, and Fei-Fei Li. 2019. Visual relationships as functions: Enabling few-shot scene graph prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.

[9]

Jiarui Feng, Yixin Chen, Fuhai Li, Anindya Sarkar, and Muhan Zhang. 2022. How powerful are k-hop message passing graph neural networks. arXiv preprint arXiv:2205.13328 (2022).

[10]

Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, and Mingyang Ling. 2019. Scene graph generation with external knowledge and image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1969--1978.

[11]

R. Herzig, A. Bar, H. Xu, G. Chechik, T. Darrell, and A. Globerson. 2020. Learning Canonical Representations for Scene Graph to Image Generation. In European Conference on Computer Vision. 210--227.

[12]

Justin Johnson, Agrim Gupta, and Li Fei-Fei. 2018. Image generation from scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1219--1228.

[13]

Deunsol Jung, Sanghyun Kim, Won Hwa Kim, and Minsu Cho. 2023. Devil's on the Edges: Selective Quad Attention for Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18664--18674.

[14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[15]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).

[16]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, Vol. 123, 1 (2017), 32--73.

[17]

Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, and Jun Xiao. 2022a. The devil is in the labels: Noisy label correction for robust scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18869--18878.

[18]

Rongjie Li, Songyang Zhang, Bo Wan, and Xuming He. 2021. Bipartite graph network with adaptive message passing for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11109--11119.

[19]

Wei Li, Haiwei Zhang, Qijie Bai, Guoqing Zhao, Ning Jiang, and Xiaojie Yuan. 2022b. Ppdl: Predicate probability distribution based loss for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19447--19456.

[20]

Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, and Xiaogang Wang. 2018. Factorizable net: an efficient subgraph-based framework for scene graph generation. In Proceedings of the European Conference on Computer Vision (ECCV). 335--351.

Digital Library

[21]

Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, and Xiaogang Wang. 2017. Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE International Conference on Computer Vision. 1261--1270.

[22]

Bingqian Lin, Yi Zhu, and Xiaodan Liang. 2022. Atom correlation based graph propagation for scene graph generation. Pattern Recognition, Vol. 122 (2022), 108300.

Digital Library

[23]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.

[24]

T. Y. Lin, M. Maire, S. Belongie, J. Hays, and C. L. Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision.

[25]

Xin Lin, Changxing Ding, Jinquan Zeng, and Dacheng Tao. 2020. Gps-net: Graph property sensing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3746--3753.

[26]

Hugo Liu and Push Singh. 2004. ConceptNet-a practical commonsense reasoning tool-kit. BT technology journal, Vol. 22, 4 (2004), 211--226.

[27]

Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European conference on computer vision. Springer, 852--869.

[28]

Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting heterophily for graph neural networks. arXiv preprint arXiv:2210.07606 (2022).

[29]

Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, and Jingkuan Song. 2022. Fine-grained predicates learning for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19467--19475.

[30]

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30. Citeseer, 3.

[31]

George Miller. 1995. Wordnet: a lexical database for english communications of the acm 38 (11) 3941. Niemela, I (1995).

[32]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

[33]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.

[34]

Sahand Sharifzadeh, Sina Moayed Baharlou, Martin Schmitt, Hinrich Schütze, and Volker Tresp. 2021b. Improving Scene Graph Classification by Exploiting Knowledge from Texts. arXiv preprint arXiv:2102.04760 (2021).

[35]

Sahand Sharifzadeh, Sina Moayed Baharlou, and Volker Tresp. 2021a. Classification by Attention: Scene Graph Classification with Prior Knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5025--5033.

[36]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3716--3725.

[37]

Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. 2019. Learning to compose dynamic tree structures for visual contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6619--6628.

[38]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).

[39]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), Vol. 38, 5 (2019), 1--12.

[40]

Pengfei Wei, Yiping Ke, Yew Soon Ong, and Zejun Ma. 2022. Adaptive Transfer Kernel Learning for Transfer Gaussian Process Regression. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

[41]

Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5419.

[42]

Minghao Xu, Meng Qu, Bingbing Ni, and Jian Tang. 2021. Joint Modeling of Visual Objects and Relations for Scene Graph Generation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 7689--7702.

[43]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1316--1324.

[44]

Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua. 2020. PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation. In Proceedings of the 28th ACM International Conference on Multimedia. 265--273.

Digital Library

[45]

Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph r-cnn for scene graph generation. In Proceedings of the European conference on computer vision (ECCV). 670--685.

Digital Library

[46]

Xu Yang, Kaihua Tang, Hanwang Zhang, and Jianfei Cai. 2019. Auto-encoding scene graphs for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10685--10694.

[47]

Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2018. Exploring visual relationship for image captioning. In Proceedings of the European conference on computer vision (ECCV). 684--699.

Digital Library

[48]

Ruichi Yu, Ang Li, Vlad I Morariu, and Larry S Davis. 2017. Visual relationship detection with internal and external linguistic knowledge distillation. In Proceedings of the IEEE international conference on computer vision. 1974--1982.

[49]

Alireza Zareian, Svebor Karaman, and Shih-Fu Chang. 2020a. Bridging knowledge graphs to generate scene graphs. In European Conference on Computer Vision. Springer, 606--623.

Digital Library

[50]

Alireza Zareian, Svebor Karaman, and Shih-Fu Chang. 2020b. Weakly supervised visual semantic parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3736--3745.

[51]

Alireza Zareian, Zhecan Wang, Haoxuan You, and Shih-Fu Chang. 2020c. Learning visual commonsense for robust scene graph generation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIII 16. Springer, 642--657.

Digital Library

[52]

Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5831--5840.

[53]

Ji Zhang, Kevin J Shih, Ahmed Elgammal, Andrew Tao, and Bryan Catanzaro. 2019. Graphical contrastive losses for scene graph parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11535--11543.

[54]

Yifei Zhang, Hao Zhu, Zixing Song, Piotr Koniusz, and Irwin King. 2022. Spectral Feature Augmentation for Graph Contrastive Learning and Beyond. arXiv preprint arXiv:2212.01026 (2022).

[55]

Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, and Jingkuan Song. 2023 a. Prototype-based Embedding Network for Scene Graph Generation. arXiv preprint arXiv:2303.07096 (2023).

[56]

Wenbo Zheng, Lan Yan, Wenwen Zhang, and Fei-Yue Wang. 2023 b. Webly Supervised Knowledge-Embedded Model for Visual Reasoning. IEEE Transactions on Neural Networks and Learning Systems (2023).

[57]

Y. Zhong, L. Wang, J. Chen, D. Yu, and Y. Li. 2020. Comprehensive Image Captioning via Scene Graph Decomposition. In European Conference on Computer Vision. 211--229.

Cited By

Liao XWei WChen DFu YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681542(8815-8824)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681542
Junwei HXu QJiang YWang ZSun YHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681118
Zhang RShang ZWang FYang ZCao SCen YAn GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680973

Index Terms

Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Boosting Scene Graph Generation with Visual Relation Saliency
The scene graph is a symbolic data structure that comprehensively describes the objects and visual relations in a visual scene, while ignoring the inherent perceptual saliency of each visual relation (i.e., relation saliency). However, humans often ...
Review on scene graph generation methods

A scene graph generation is a structured way of representing the image in a graphical network and it is mostly used to describe a scene’s objects and attributes and the relationship between the objects in the image. Image retrieval, video captioning, ...
Learning Visual Commonsense for Robust Scene Graph Generation
Computer Vision – ECCV 2020
Abstract
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Science and Technology Commission
National Natural Science Foundation of China
Natural Science Foundation Project of CQ
the Open Project Program of the State Key Lab of CAD&CG

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
172
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)8

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liao XWei WChen DFu YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681542(8815-8824)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681542
Junwei HXu QJiang YWang ZSun YHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681118
Zhang RShang ZWang FYang ZCao SCen YAn GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680973

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten