An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation

Hu, Lingfeng; Liu, Si; Wang, Hanzi

doi:10.1007/978-981-99-8429-9_28

Lingfeng Hu¹⁵,
Si Liu¹⁶ &
Hanzi Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

906 Accesses

Abstract

Despite the remarkable advancements in Scene Graph Generation (SGG) in recent years, the precise capture and modeling of long-tail object relationships remain persistent challenges in the field. Conventional methods generally employ resampling and reweighting techniques to achieve unbiased predictions. Existing reweighting methods in SGG calculate weights based on the class distribution of the dataset. And they focus on the reweighting of the related samples while overlooking the reweighting of the samples whose objects are unrelated. However, the sample distribution during the training process is inconsistent with the class distribution of the dataset, and the reweighting of samples whose objects are unrelated should not be overlooked. In this paper, we propose a novel method named Dynamic Reweighting based on the Sample Distribution (DRSD). The DRSD method calculates the weights of classes based on the sample distribution during the training process and incorporates reweighting for the samples whose objects are unrelated. Specifically, we utilize a sample queue mechanism to record and update the sample distribution and introduce a transition mechanism to ensure training stability. The experiments conducted on the Visual Genome dataset demonstrate the effectiveness of our method. Our method exhibits model-agnostic characteristics and yields significant performance improvements on three benchmark models (Motif, VCTree, and Transformer). Specifically, it achieves an increase of \(23.4\%\), \(25.1\%\), and \(27.6\%\) on the mR@100 metric for the Predicate Classification task, achieving \(40.9\%\), \(41.2\%\), and \(43.4\%\), respectively. Moreover, our method outperforms the state-of-the-art reweighting method in SGG, i.e. FGPL, by \(3\%\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://homes.cs.washington.edu/~ranjay/visualgenome/index.html.

References

Abedi, A., Karshenas, H., Adibi, P.: Multi-modal reward for visual relationships-based image captioning. arXiv preprint arXiv:2303.10766 (2023)
Chen, S., Jin, Q., Wang, P., Wu, Q.: Say as you wish: fine-grained control of image caption generation with abstract scene graphs. In: CVPR (2020)
Google Scholar
Chen, T., Yu, W., Chen, R., Lin, L.: Knowledge-embedded routing network for scene graph generation. In: CVPR (2019)
Google Scholar
Deng, Y., et al.: Hierarchical memory learning for fine-grained scene graph generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV. LNCS, vol. 13687, pp. 266–283. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_16
Dong, X., Gan, T., Song, X., Wu, J., Cheng, Y., Nie, L.: Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In: CVPR (2022)
Google Scholar
Guo, Y., Chen, J., Zhang, H., Jiang, Y.G.: Visual relations augmented cross-modal retrieval. In: ICMR (2020)
Google Scholar
Guo, Y., et al.: From general to specific: informative scene graph generation via balance adjustment. In: ICCV (2021)
Google Scholar
Hildebrandt, M., Li, H., Koner, R., Tresp, V., Günnemann, S.: Scene graph reasoning for visual question answering. arXiv preprint arXiv:2007.01072 (2020)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV (2017)
Google Scholar
Lertnattee, V., Theeramunkong, T.: Analysis of inverse class frequency in centroid-based text classification. In: ISCIT (2004)
Google Scholar
Li, M., Qi, Y.: XPNet: cross-domain prototypical network for zero-shot sketch-based image retrieval. In: Yu, S., et al. (eds.) PRCV. LNCS, vol. 13534, pp. 394–410. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18907-4_31
Li, R., Zhang, S., Wan, B., He, X.: Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: CVPR (2021)
Google Scholar
Lyu, X., et al.: Fine-grained predicates learning for scene graph generation. In: CVPR (2022)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Schroeder, B., Tripathi, S.: Structured query-based image retrieval using scene graphs. In: CVPRW (2020)
Google Scholar
Song, J., Zeng, P., Gao, L., Shen, H.T.: From pixels to objects: cubic visual attention for visual question answering. arXiv preprint arXiv:2206.01923 (2022)
Song, X., Chen, J., Wu, Z., Jiang, Y.G.: Spatial-temporal graphs for cross-modal Text2Video retrieval. IEEE T-MM (2021)
Google Scholar
Tang, K.: A scene graph generation codebase in Pytorch (2020). https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: CVPR (2020)
Google Scholar
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: CVPR (2019)
Google Scholar
Teney, D., Liu, L., van Den Hengel, A.: Graph-structured representations for visual question answering. In: CVPR (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, J., et al.: Seesaw loss for long-tailed instance segmentation. In: CVPR (2021)
Google Scholar
Xu, P., Chang, X., Guo, L., Huang, P.Y., Chen, X., Hauptmann, A.G.: A survey of scene graph: generation and application. TNNLS (2020)
Google Scholar
Yan, S., et al.: PCPL: predicate-correlation perception learning for unbiased scene graph generation. In: ACM MM (2020)
Google Scholar
Yang, X., et al.: Transforming visual scene graphs to image captions. arXiv preprint arXiv:2305.02177 (2023)
Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: CVPR (2019)
Google Scholar
Yu, J., Chai, Y., Wang, Y., Hu, Y., Wu, Q.: CogTree: cognition tree loss for unbiased scene graph generation. arXiv preprint arXiv:2009.07526 (2020)
Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: CVPR (2018)
Google Scholar
Zhang, A., et al.: Fine-grained scene graph generation with data transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) ECCV. LNCS, vol. 13687, pp. 409–424. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_24
Zhang, C., Chao, W.L., Xuan, D.: An empirical study on leveraging scene graphs for visual question answering. arXiv preprint arXiv:1907.12133 (2019)
Zhou, X., Li, S., Chen, H., Zhu, A.: Disentangled OCR: a more granular information for “text”-to-image retrieval. In: PRCV (2022)
Google Scholar

Download references

Acknowledgments

This work was supported by National Key R &D Program of China under Grant 2022ZD0115502, by the National Natural Science Foundation of China under Grant U21A20514 and 62122010, and by the FuXiaQuan National Independent Innovation Demonstration Zone Collaborative Innovation Platform Project under Grant 3502ZCQXT2022008.

Author information

Authors and Affiliations

Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, 361005, Xiamen, People’s Republic of China
Lingfeng Hu & Hanzi Wang
Institute of Artificial Intelligence, Beihang University, Beijing, China
Si Liu

Authors

Lingfeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Si Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hanzi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanzi Wang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, L., Liu, S., Wang, H. (2024). An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_28

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_28
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation