Skip to main content

Causal Property Based Anti-conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13844))

Included in the following conference series:

  • 338 Accesses

Abstract

Scene Graph Generation (SGG) aims to detect visual triplets of pairwise objects based on object detection. There are three key factors being explored to determine a scene graph: visual information, local and global context, and prior knowledge. However, conventional methods balancing losses among these factors lead to conflict, causing ambiguity, inaccuracy, and inconsistency. In this work, to apply evidence theory to scene graph generation, a novel plug-and-play Causal Property based Anti-conflict Modeling (CPAM) module is proposed, which models key factors by Dempster-Shafer evidence theory, and integrates quantitative information effectively. Compared with the existing methods, the proposed CPAM makes the training process interpretable, and also manages to cover more fine-grained relationships after inconsistencies reduction. Furthermore, we propose a Hybrid Data Augmentation (HDA) method, which facilitates data transfer as well as conventional debiasing methods to enhance the dataset. By combining CPAM with HDA, significant improvement has been achieved over the previous state-of-the-art methods. And extensive ablation studies have also been conducted to demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barnett, J.A.: Computational methods for a mathematical theory of evidence. In: Yager, R.R., Liu, L. (eds.) Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 197–216. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-44792-4_8

    Chapter  Google Scholar 

  2. Burnaev, E., Erofeev, P., Papanov, A.: Influence of resampling on accuracy of imbalanced classification. In: Eighth International Conference on Machine Vision (ICMV 2015), vol. 9875, pp. 423–427. SPIE (2015)

    Google Scholar 

  3. Chen, T., Yu, W., Chen, R., Lin, L.: Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6163–6171 (2019)

    Google Scholar 

  4. Dempster, A.P.: A generalization of Bayesian inference. J. Roy. Stat. Soc. Ser. B (Methodol.) 30(2), 205–232 (1968)

    MathSciNet  MATH  Google Scholar 

  5. Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. In: Yager, R.R., Liu, L. (eds.) Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 57–72. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-44792-4_3

    Chapter  Google Scholar 

  6. Denœux, T., Zouhal, L.M.: Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst. 122(3), 409–424 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Desai, A., Wu, T.Y., Tripathi, S., Vasconcelos, N.: Learning of visual relations: the devil is in the tails. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15404–15413 (2021)

    Google Scholar 

  8. Dong, M., Peng, J., Ding, S., Wang, Z.: Vision and EMG information fusion based on DS evidence theory for gesture recognition. In: Deng, Z. (ed.) Proceedings of 2021 Chinese Intelligent Automation Conference. LNEE, vol. 801, pp. 492–501. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6372-7_55

    Chapter  Google Scholar 

  9. Dong, X., Gan, T., Song, X., Wu, J., Cheng, Y., Nie, L.: Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19427–19436 (2022)

    Google Scholar 

  10. Florea, M.C., Jousselme, A.L., Bossé, É., Grenier, D.: Robust combination rules for evidence theory. Inf. Fusion 10(2), 183–197 (2009)

    Article  Google Scholar 

  11. Gordon, J., Shortliffe, E.H.: A method for managing evidential reasoning in a hierarchical hypothesis space. Artif. Intell. 26(3), 323–357 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  12. Guo, Y., et al.: From general to specific: informative scene graph generation via balance adjustment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16383–16392 (2021)

    Google Scholar 

  13. Jiang, W.: A correlation coefficient for belief functions. Int. J. Approx. Reason. 103, 94–106 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  15. Li, B., Han, Z., Li, H., Fu, H., Zhang, C.: Trustworthy long-tailed classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6970–6979 (2022)

    Google Scholar 

  16. Li, R., Zhang, S., Wan, B., He, X.: Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11109–11119 (2021)

    Google Scholar 

  17. Li, Y., Ouyang, W., Wang, X., Tang, X.: VIP-CNN: visual phrase guided convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1347–1356 (2017)

    Google Scholar 

  18. Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 335–351 (2018)

    Google Scholar 

  19. Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1261–1270 (2017)

    Google Scholar 

  20. Liao, S.H.: Expert system methodologies and applications-a decade review from 1995 to 2004. Expert Syst. Appl. 28(1), 93–103 (2005)

    Article  Google Scholar 

  21. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  22. Lin, X., Ding, C., Zeng, J., Tao, D.: GPS-Net: graph property sensing network for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3746–3753 (2020)

    Google Scholar 

  23. Liu, H., Yan, N., Mortazavi, M., Bhanu, B.: Fully convolutional scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11546–11556 (2021)

    Google Scholar 

  24. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51

    Chapter  Google Scholar 

  25. Qi, S., Zhu, Y., Huang, S., Jiang, C., Zhu, S.C.: Human-centric indoor scene synthesis using stochastic grammar. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5899–5908 (2018)

    Google Scholar 

  26. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  27. Shafer, G.: A mathematical theory of evidence. In: A Mathematical Theory of Evidence. Princeton University Press, Princeton (2021)

    Google Scholar 

  28. Suhail, M., et al.: Energy-based learning for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13936–13945 (2021)

    Google Scholar 

  29. Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3716–3725 (2020)

    Google Scholar 

  30. Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6619–6628 (2019)

    Google Scholar 

  31. Wang, X., Wang, T.: Research on face recognition algorithm based on ds evidence theory and local domain pattern. In: 2021 International Conference on Intelligent Computing, Automation and Applications (ICAA), pp. 261–266. IEEE (2021)

    Google Scholar 

  32. Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5419 (2017)

    Google Scholar 

  33. Xu, Z., Zhang, B., Fu, H., Yue, X., Lv, Y.: Multi-branch recurrent attention convolutional neural network with evidence theory for fine-grained image classification. In: Denœux, T., Lefèvre, E., Liu, Z., Pichon, F. (eds.) BELIEF 2021. LNCS (LNAI), vol. 12915, pp. 177–184. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88601-1_18

    Chapter  Google Scholar 

  34. Yan, S., et al.: PCPL: predicate-correlation perception learning for unbiased scene graph generation. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 265–273 (2020)

    Google Scholar 

  35. Yang, G., Zhang, J., Zhang, Y., Wu, B., Yang, Y.: Probabilistic modeling of semantic ambiguity for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12527–12536 (2021)

    Google Scholar 

  36. Yang, J.B., Liu, J., Wang, J., Sii, H.S., Wang, H.W.: Belief rule-base inference methodology using the evidential reasoning approach-RIMER. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Hum. 36(2), 266–285 (2006)

    Article  Google Scholar 

  37. Yang, J.B., Xu, D.L.: Evidential reasoning rule for evidence combination. Artif. Intell. 205, 1–29 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  38. Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694 (2019)

    Google Scholar 

  39. Yu, J., Chai, Y., Wang, Y., Hu, Y., Wu, Q.: Cogtree: cognition tree loss for unbiased scene graph generation. arXiv preprint arXiv:2009.07526 (2020)

  40. Zadeh, L.A.: On the validity of Dempster’s rule of combination of evidence. Infinite Study (1979)

    Google Scholar 

  41. Zareian, A., Karaman, S., Chang, S.-F.: Bridging knowledge graphs to generate scene graphs. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 606–623. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_36

    Chapter  Google Scholar 

  42. Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5831–5840 (2018)

    Google Scholar 

  43. Zhang, A., et al.: Fine-grained scene graph generation with data transfer. arXiv preprint arXiv:2203.11654 (2022)

  44. Zhang, C., Chao, W.L., Xuan, D.: An empirical study on leveraging scene graphs for visual question answering. arXiv preprint arXiv:1907.12133 (2019)

  45. Zhang, H., Kyaw, Z., Chang, S.F., Chua, T.S.: Visual translation embedding network for visual relation detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5532–5540 (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Key R &D Program of China (2019YFB2204200) and the National Natural Science Foundation of China (62006015 and 62072028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaoyun An .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R., An, G. (2023). Causal Property Based Anti-conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13844. Springer, Cham. https://doi.org/10.1007/978-3-031-26316-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26316-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26315-6

  • Online ISBN: 978-3-031-26316-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics