Abstract
Despite the remarkable advancements in Scene Graph Generation (SGG) in recent years, the precise capture and modeling of long-tail object relationships remain persistent challenges in the field. Conventional methods generally employ resampling and reweighting techniques to achieve unbiased predictions. Existing reweighting methods in SGG calculate weights based on the class distribution of the dataset. And they focus on the reweighting of the related samples while overlooking the reweighting of the samples whose objects are unrelated. However, the sample distribution during the training process is inconsistent with the class distribution of the dataset, and the reweighting of samples whose objects are unrelated should not be overlooked. In this paper, we propose a novel method named Dynamic Reweighting based on the Sample Distribution (DRSD). The DRSD method calculates the weights of classes based on the sample distribution during the training process and incorporates reweighting for the samples whose objects are unrelated. Specifically, we utilize a sample queue mechanism to record and update the sample distribution and introduce a transition mechanism to ensure training stability. The experiments conducted on the Visual Genome dataset demonstrate the effectiveness of our method. Our method exhibits model-agnostic characteristics and yields significant performance improvements on three benchmark models (Motif, VCTree, and Transformer). Specifically, it achieves an increase of \(23.4\%\), \(25.1\%\), and \(27.6\%\) on the mR@100 metric for the Predicate Classification task, achieving \(40.9\%\), \(41.2\%\), and \(43.4\%\), respectively. Moreover, our method outperforms the state-of-the-art reweighting method in SGG, i.e. FGPL, by \(3\%\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abedi, A., Karshenas, H., Adibi, P.: Multi-modal reward for visual relationships-based image captioning. arXiv preprint arXiv:2303.10766 (2023)
Chen, S., Jin, Q., Wang, P., Wu, Q.: Say as you wish: fine-grained control of image caption generation with abstract scene graphs. In: CVPR (2020)
Chen, T., Yu, W., Chen, R., Lin, L.: Knowledge-embedded routing network for scene graph generation. In: CVPR (2019)
Deng, Y., et al.: Hierarchical memory learning for fine-grained scene graph generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV. LNCS, vol. 13687, pp. 266–283. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_16
Dong, X., Gan, T., Song, X., Wu, J., Cheng, Y., Nie, L.: Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In: CVPR (2022)
Guo, Y., Chen, J., Zhang, H., Jiang, Y.G.: Visual relations augmented cross-modal retrieval. In: ICMR (2020)
Guo, Y., et al.: From general to specific: informative scene graph generation via balance adjustment. In: ICCV (2021)
Hildebrandt, M., Li, H., Koner, R., Tresp, V., Günnemann, S.: Scene graph reasoning for visual question answering. arXiv preprint arXiv:2007.01072 (2020)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV (2017)
Lertnattee, V., Theeramunkong, T.: Analysis of inverse class frequency in centroid-based text classification. In: ISCIT (2004)
Li, M., Qi, Y.: XPNet: cross-domain prototypical network for zero-shot sketch-based image retrieval. In: Yu, S., et al. (eds.) PRCV. LNCS, vol. 13534, pp. 394–410. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18907-4_31
Li, R., Zhang, S., Wan, B., He, X.: Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: CVPR (2021)
Lyu, X., et al.: Fine-grained predicates learning for scene graph generation. In: CVPR (2022)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)
Schroeder, B., Tripathi, S.: Structured query-based image retrieval using scene graphs. In: CVPRW (2020)
Song, J., Zeng, P., Gao, L., Shen, H.T.: From pixels to objects: cubic visual attention for visual question answering. arXiv preprint arXiv:2206.01923 (2022)
Song, X., Chen, J., Wu, Z., Jiang, Y.G.: Spatial-temporal graphs for cross-modal Text2Video retrieval. IEEE T-MM (2021)
Tang, K.: A scene graph generation codebase in Pytorch (2020). https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: CVPR (2020)
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: CVPR (2019)
Teney, D., Liu, L., van Den Hengel, A.: Graph-structured representations for visual question answering. In: CVPR (2017)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wang, J., et al.: Seesaw loss for long-tailed instance segmentation. In: CVPR (2021)
Xu, P., Chang, X., Guo, L., Huang, P.Y., Chen, X., Hauptmann, A.G.: A survey of scene graph: generation and application. TNNLS (2020)
Yan, S., et al.: PCPL: predicate-correlation perception learning for unbiased scene graph generation. In: ACM MM (2020)
Yang, X., et al.: Transforming visual scene graphs to image captions. arXiv preprint arXiv:2305.02177 (2023)
Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: CVPR (2019)
Yu, J., Chai, Y., Wang, Y., Hu, Y., Wu, Q.: CogTree: cognition tree loss for unbiased scene graph generation. arXiv preprint arXiv:2009.07526 (2020)
Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: CVPR (2018)
Zhang, A., et al.: Fine-grained scene graph generation with data transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) ECCV. LNCS, vol. 13687, pp. 409–424. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_24
Zhang, C., Chao, W.L., Xuan, D.: An empirical study on leveraging scene graphs for visual question answering. arXiv preprint arXiv:1907.12133 (2019)
Zhou, X., Li, S., Chen, H., Zhu, A.: Disentangled OCR: a more granular information for “text”-to-image retrieval. In: PRCV (2022)
Acknowledgments
This work was supported by National Key R &D Program of China under Grant 2022ZD0115502, by the National Natural Science Foundation of China under Grant U21A20514 and 62122010, and by the FuXiaQuan National Independent Innovation Demonstration Zone Collaborative Innovation Platform Project under Grant 3502ZCQXT2022008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, L., Liu, S., Wang, H. (2024). An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_28
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)