skip to main content
10.1145/3581783.3612031acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation

Published: 27 October 2023 Publication History

Abstract

One of the major obstacles that hinders the current scene graph generation (SGG) performance lies in the severe predicate annotation bias. Conventional solutions to this problem are mainly based on reweighting/resampling heuristics. Despite achieving some improvements on tail classes, these methods are prone to cause serious performance degradation of head predicates. In this paper, we propose to tackle this problem from a brand-new perspective of dark knowledge. In consideration of the unique nature of SGG that requires a large number of negative samples to be employed for predicate learning, we design to capitalize on the dark knowledge contained in negative samples for debiasing the predicate distribution. Along such vein, we propose a novel SGG method dubbed Dark Knowledge Balance Learning (DKBL). In DKBL, we first design a dark knowledge balancing loss, which helps the model learn to balance head and tail predicates while maintaining the overall performance. We further introduce a dark knowledge semantic enhancement module to better encode the semantics of predicates. DKBL is orthogonal to existing SGG methods and can be easily plugged into their training process for further improvement. Extensive experiments on VG dataset show that the proposed DKBL can consistently achieve well trade-off performance between head and tail predicates, which is significantly better than previous state-of-the-art methods. The code is available in https://github.com/chenzqing/DKBL.

References

[1]
Jonathon Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning?. In International Conference on Machine Learning. PMLR, 872--881.
[2]
Jiarui Cai, Yizhou Wang, and Jenq-Neng Hwang. 2021. Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 112--121.
[3]
Chao Chen, Yibing Zhan, Baosheng Yu, Liu Liu, Yong Luo, and Bo Du. 2022. Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation. (2022).
[4]
Tianshui Chen, Weihao Yu, Riquan Chen, and Liang Lin. 2019. Knowledge-embedded routing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6163--6171.
[5]
Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2018. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[6]
Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, and Jiashi Feng. 2021. Recovering the unbiased scene graphs from the biased ones. In Proceedings of the 29th ACM International Conference on Multimedia. 1581--1590.
[7]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9268--9277.
[8]
Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng, and Liqiang Nie. 2022. Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19427--19436.
[9]
Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born again neural networks. In International Conference on Machine Learning. PMLR, 1607--1616.
[10]
Lianli Gao, Yu Lei, Pengpeng Zeng, Jingkuan Song, Meng Wang, and Heng Tao Shen. 2021. Hierarchical representation network with auxiliary tasks for video captioning and video question answering. IEEE Transactions on Image Processing, Vol. 31 (2021), 202--215.
[11]
Shalini Ghosh, Giedrius Burachas, Arijit Ray, and Avi Ziskind. 2019. Generating natural language explanations for visual question answering using scene graphs and visual attention. arXiv preprint arXiv:1902.05715 (2019).
[12]
Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, and Jingkuan Song. 2021. From general to specific: Informative scene graph generation via balance adjustment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16383--16392.
[13]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. stat, Vol. 1050 (2015), 9.
[14]
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3668--3678.
[15]
Yoon Kim and Alexander M Rush. 2016. Sequence-Level Knowledge Distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1317--1327.
[16]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, Vol. 123, 1 (2017), 32--73.
[17]
Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, and Jun Xiao. 2022a. The devil is in the labels: Noisy label correction for robust scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18869--18878.
[18]
Rongjie Li, Songyang Zhang, Bo Wan, and Xuming He. 2021b. Bipartite graph network with adaptive message passing for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11109--11119.
[19]
Wenhui Li, Yan Wang, Yuting Su, Xuanya Li, Anan Liu, and Yongdong Zhang. 2021a. Multi-scale fine-grained alignments for image and sentence matching. IEEE Transactions on Multimedia (2021).
[20]
Wenhui Li, Song Yang, Qiang Li, Xuanya Li, and An-An Liu. 2023. Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval. IEEE Transactions on Multimedia (2023).
[21]
Wei Li, Haiwei Zhang, Qijie Bai, Guoqing Zhao, Ning Jiang, and Xiaojie Yuan. 2022b. PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19447--19456.
[22]
Xiangyang Li and Shuqiang Jiang. 2019. Know more say less: Image captioning based on scene graphs. IEEE Transactions on Multimedia, Vol. 21, 8 (2019), 2117--2130.
[23]
Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9572--9581.
[24]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[25]
Xin Lin, Changxing Ding, Jinquan Zeng, and Dacheng Tao. 2020. Gps-net: Graph property sensing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3746--3753.
[26]
Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European conference on computer vision. Springer, 852--869.
[27]
Yawei Luo, Rongrong Ji, Tao Guan, Junqing Yu, Ping Liu, and Yi Yang. 2020a. Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning. Pattern Recognition, Vol. 106 (2020), 107451.
[28]
Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2020b. Adversarial style mining for one-shot unsupervised domain adaptation. Advances in neural information processing systems, Vol. 33 (2020), 20612--20623.
[29]
Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 8 (2021), 3940--3956.
[30]
Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2507--2516.
[31]
Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, and Jun Xiao. 2022. Rethinking the reference-based distinctive image captioning. In Proceedings of the 30th ACM International Conference on Multimedia. 4374--4384.
[32]
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 5191--5198.
[33]
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.
[34]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).
[35]
Brigit Schroeder and Subarna Tripathi. 2020. Structured query-based image retrieval using scene graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 178--179.
[36]
Sahand Sharifzadeh, Sina Moayed Baharlou, and Volker Tresp. 2021. Classification by attention: Scene graph classification with prior knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 5025--5033.
[37]
Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3716--3725.
[38]
Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. 2019. Learning to compose dynamic tree structures for visual contexts. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6619--6628.
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[40]
Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao, Jianwen Sun, and Jun Xiao. 2023 a. Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation. arXiv preprint arXiv:2307.16309 (2023).
[41]
Wenqing Wang, Yawei Luo, Zhiqing Chen, Tao Jiang, Lei Chen, Yi Yang, and Jun Xiao. 2023 b. Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning. arXiv (2023).
[42]
Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella X Yu. 2020. Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020).
[43]
Liuyu Xiang, Guiguang Ding, and Jungong Han. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conference on Computer Vision. Springer, 247--263.
[44]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492--1500.
[45]
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5419.
[46]
Shaotian Yan, Chen Shen, Zhongming Jin, Jianqiang Huang, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua. 2020. Pcpl: Predicate-correlation perception learning for unbiased scene graph generation. In Proceedings of the 28th ACM International Conference on Multimedia. 265--273.
[47]
Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph r-cnn for scene graph generation. In Proceedings of the European conference on computer vision (ECCV). 670--685.
[48]
Xu Yang, Kaihua Tang, Hanwang Zhang, and Jianfei Cai. 2019. Auto-encoding scene graphs for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10685--10694.
[49]
Jing Yu, Yuan Chai, Yujing Wang, Yue Hu, and Qi Wu. 2020. Cogtree: Cognition tree loss for unbiased scene graph generation. arXiv (2020).
[50]
Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5831--5840.
[51]
Ao Zhang, Yuan Yao, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, and Tat-Seng Chua. 2022. Fine-Grained Scene Graph Generation with Data Transfer. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVII. Springer, 409--424.
[52]
Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 11953--11962.
[53]
Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, and Qi Wu. 2020. Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. arXiv (2020).

Cited By

View all
  • (2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
  • (2024)Calibration for Long-tailed Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680818(3037-3046)Online publication date: 28-Oct-2024
  • (2024)Neighborhood-Adaptive Context Enhancement Learning For Scene Graph Generation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688027(1-6)Online publication date: 15-Jul-2024

Index Terms

  1. Dark Knowledge Balance Learning for Unbiased Scene Graph Generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dark knowledge
    2. long-tailed distribution
    3. scene graph generation

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities
    • National Key Research & Development Project of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)78
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Synergetic Prototype Learning Network for Unbiased Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680973(945-954)Online publication date: 28-Oct-2024
    • (2024)Calibration for Long-tailed Scene Graph GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680818(3037-3046)Online publication date: 28-Oct-2024
    • (2024)Neighborhood-Adaptive Context Enhancement Learning For Scene Graph Generation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688027(1-6)Online publication date: 15-Jul-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media