research-article

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation

Authors:

Jun XiaoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 4838 - 4847

https://doi.org/10.1145/3581783.3612031

Published: 27 October 2023 Publication History

Abstract

One of the major obstacles that hinders the current scene graph generation (SGG) performance lies in the severe predicate annotation bias. Conventional solutions to this problem are mainly based on reweighting/resampling heuristics. Despite achieving some improvements on tail classes, these methods are prone to cause serious performance degradation of head predicates. In this paper, we propose to tackle this problem from a brand-new perspective of dark knowledge. In consideration of the unique nature of SGG that requires a large number of negative samples to be employed for predicate learning, we design to capitalize on the dark knowledge contained in negative samples for debiasing the predicate distribution. Along such vein, we propose a novel SGG method dubbed Dark Knowledge Balance Learning (DKBL). In DKBL, we first design a dark knowledge balancing loss, which helps the model learn to balance head and tail predicates while maintaining the overall performance. We further introduce a dark knowledge semantic enhancement module to better encode the semantics of predicates. DKBL is orthogonal to existing SGG methods and can be easily plugged into their training process for further improvement. Extensive experiments on VG dataset show that the proposed DKBL can consistently achieve well trade-off performance between head and tail predicates, which is significantly better than previous state-of-the-art methods. The code is available in https://github.com/chenzqing/DKBL.

References

[1]

Jonathon Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning?. In International Conference on Machine Learning. PMLR, 872--881.

[2]

Jiarui Cai, Yizhou Wang, and Jenq-Neng Hwang. 2021. Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 112--121.

[3]

Chao Chen, Yibing Zhan, Baosheng Yu, Liu Liu, Yong Luo, and Bo Du. 2022. Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation. (2022).

[4]

Tianshui Chen, Weihao Yu, Riquan Chen, and Liang Lin. 2019. Knowledge-embedded routing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6163--6171.

[5]

Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2018. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[6]

Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, and Jiashi Feng. 2021. Recovering the unbiased scene graphs from the biased ones. In Proceedings of the 29th ACM International Conference on Multimedia. 1581--1590.

Digital Library

[7]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9268--9277.

[8]

Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng, and Liqiang Nie. 2022. Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19427--19436.

[9]

Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born again neural networks. In International Conference on Machine Learning. PMLR, 1607--1616.

[10]

Lianli Gao, Yu Lei, Pengpeng Zeng, Jingkuan Song, Meng Wang, and Heng Tao Shen. 2021. Hierarchical representation network with auxiliary tasks for video captioning and video question answering. IEEE Transactions on Image Processing, Vol. 31 (2021), 202--215.

[11]

Shalini Ghosh, Giedrius Burachas, Arijit Ray, and Avi Ziskind. 2019. Generating natural language explanations for visual question answering using scene graphs and visual attention. arXiv preprint arXiv:1902.05715 (2019).

[12]

Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, and Jingkuan Song. 2021. From general to specific: Informative scene graph generation via balance adjustment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16383--16392.

[13]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. stat, Vol. 1050 (2015), 9.

[14]

Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3668--3678.

[15]

Yoon Kim and Alexander M Rush. 2016. Sequence-Level Knowledge Distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1317--1327.

[16]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, Vol. 123, 1 (2017), 32--73.

[17]

Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, and Jun Xiao. 2022a. The devil is in the labels: Noisy label correction for robust scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18869--18878.

[18]

Rongjie Li, Songyang Zhang, Bo Wan, and Xuming He. 2021b. Bipartite graph network with adaptive message passing for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11109--11119.

[19]

Wenhui Li, Yan Wang, Yuting Su, Xuanya Li, Anan Liu, and Yongdong Zhang. 2021a. Multi-scale fine-grained alignments for image and sentence matching. IEEE Transactions on Multimedia (2021).

[20]

Wenhui Li, Song Yang, Qiang Li, Xuanya Li, and An-An Liu. 2023. Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval. IEEE Transactions on Multimedia (2023).

Digital Library

[21]

Wei Li, Haiwei Zhang, Qijie Bai, Guoqing Zhao, Ning Jiang, and Xiaojie Yuan. 2022b. PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19447--19456.

[22]

Xiangyang Li and Shuqiang Jiang. 2019. Know more say less: Image captioning based on scene graphs. IEEE Transactions on Multimedia, Vol. 21, 8 (2019), 2117--2130.

[23]

Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9572--9581.

[24]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.

[25]

Xin Lin, Changxing Ding, Jinquan Zeng, and Dacheng Tao. 2020. Gps-net: Graph property sensing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3746--3753.

[26]

Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European conference on computer vision. Springer, 852--869.

[27]

Yawei Luo, Rongrong Ji, Tao Guan, Junqing Yu, Ping Liu, and Yi Yang. 2020a. Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning. Pattern Recognition, Vol. 106 (2020), 107451.

[28]

Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2020b. Adversarial style mining for one-shot unsupervised domain adaptation. Advances in neural information processing systems, Vol. 33 (2020), 20612--20623.

[29]

Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 8 (2021), 3940--3956.

[30]

Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2507--2516.

[31]

Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, and Jun Xiao. 2022. Rethinking the reference-based distinctive image captioning. In Proceedings of the 30th ACM International Conference on Multimedia. 4374--4384.

Digital Library

[32]

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 5191--5198.

[33]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.

[34]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).

[35]

Brigit Schroeder and Subarna Tripathi. 2020. Structured query-based image retrieval using scene graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 178--179.

[36]

Sahand Sharifzadeh, Sina Moayed Baharlou, and Volker Tresp. 2021. Classification by attention: Scene graph classification with prior knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5025--5033.

[37]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3716--3725.

[38]

Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. 2019. Learning to compose dynamic tree structures for visual contexts. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6619--6628.

[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[40]

Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao, Jianwen Sun, and Jun Xiao. 2023 a. Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation. arXiv preprint arXiv:2307.16309 (2023).

[41]

Wenqing Wang, Yawei Luo, Zhiqing Chen, Tao Jiang, Lei Chen, Yi Yang, and Jun Xiao. 2023 b. Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning. arXiv (2023).

[42]

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella X Yu. 2020. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).

[43]

Liuyu Xiang, Guiguang Ding, and Jungong Han. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conference on Computer Vision. Springer, 247--263.

Digital Library

[44]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492--1500.

[45]

Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5419.

[46]

Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua. 2020. Pcpl: Predicate-correlation perception learning for unbiased scene graph generation. In Proceedings of the 28th ACM International Conference on Multimedia. 265--273.

Digital Library

[47]

Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph r-cnn for scene graph generation. In Proceedings of the European conference on computer vision (ECCV). 670--685.

Digital Library

[48]

Xu Yang, Kaihua Tang, Hanwang Zhang, and Jianfei Cai. 2019. Auto-encoding scene graphs for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10685--10694.

[49]

Jing Yu, Yuan Chai, Yujing Wang, Yue Hu, and Qi Wu. 2020. Cogtree: Cognition tree loss for unbiased scene graph generation. arXiv (2020).

[50]

Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5831--5840.

[51]

Ao Zhang, Yuan Yao, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, and Tat-Seng Chua. 2022. Fine-Grained Scene Graph Generation with Data Transfer. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVII. Springer, 409--424.

[52]

Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 11953--11962.

[53]

Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, and Qi Wu. 2020. Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. arXiv (2020).

Cited By

Zhang RShang ZWang FYang ZCao SCen YAn GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680973
Zhu XXing YWang RWang YLan XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Calibration for Long-tailed Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680818(3037-3046)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680818
Zhao BZhang LZhang LMao Z(2024)Neighborhood-Adaptive Context Enhancement Learning For Scene Graph Generation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688027(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10688027

Index Terms

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Alleviating Training Bias with Less Cost via Multi-expert De-biasing Method in Scene Graph Generation
McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

Scene graph generation (SGG) methods have suffered from a severe training bias towards frequent (head) predicate classes. Recent works owe it to the long-tailed distribution of predicates and alleviate the long-tailed problem to conduct de-biasing. ...
Relation-Specific Feature Augmentation for unbiased scene graph generation
Abstract
Scene Graph Generation (SGG) models suffer from the long-tailed distribution of relations, which results in biased predictions that favor head relations (e.g., on) over informative tail ones (e.g., sitting on, laying on, standing on). Existing ...
Highlights
- A novel framework to tackle the long-tailed distribution in SGG.
- Relation-specific feature augmentation to introduce helpful information.
- Performance superiority over state-of-the-art methods.
An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation
Pattern Recognition and Computer Vision
Abstract
Despite the remarkable advancements in Scene Graph Generation (SGG) in recent years, the precise capture and modeling of long-tail object relationships remain persistent challenges in the field. Conventional methods generally employ resampling and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
National Key Research & Development Project of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
192
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)6

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang RShang ZWang FYang ZCao SCen YAn GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680973
Zhu XXing YWang RWang YLan XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Calibration for Long-tailed Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680818(3037-3046)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680818
Zhao BZhang LZhang LMao Z(2024)Neighborhood-Adaptive Context Enhancement Learning For Scene Graph Generation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688027(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10688027

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten