Skip to main content

Evidential Representation Proposal for Predicate Classification Output Logits in Scene Graph Generation

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14734))

Included in the following conference series:

Abstract

A scene graph consists of a collection of triplets < subject, predicate, object > for describing an image content. One challenging problem in Scene Graph Generation (SGG) is that annotators tend to give poorly relevant predicates, which causes a bias toward less informative triplet predictions. This paper focuses on predicate classification task. We question the information processing that leads to the deduction of poorly informative predicates in current models. We argue that the set of possible predicates should not be regarded as a probability space notably because the predicates granularity varies, like on and \(sitting \; on\). We suggest an alternative representation of the information in the Dempster-Shafer framework using a goal-oriented constructed hierarchy. Thanks to this more trustworthy representation, we propose a flexible decision-making procedure that allows us to play with the predicted predicate level of granularity. Our experiments, carried out using scores estimated by an existing transformer-based scene graph generation model, show that our method helps reduce the long tail problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aditya, S., Yang, Y., Baral, C., Aloimonos, Y., Fermüller, C.: Image understanding using vision and reasoning through scene description graph. Comput. Vis. Image Underst. 173, 33–45 (2018)

    Article  Google Scholar 

  2. Cong, Y., Yang, M.Y., Rosenhahn, B.: RelTR: relation transformer for scene graph generation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 11169–11183 (2023)

    Google Scholar 

  3. Ghosh, S., Burachas, G., Ray, A., Ziskind, A.: Generating natural language explanations for visual question answering using scene graphs and visual attention (2019). arXiv preprint arXiv:1902.05715

  4. Imoussaten, A., Jacquin, L.: Cautious classification based on belief functions theory and imprecise relabelling. Int. J. Approximate Reasoning 142, 130–146 (2022)

    Article  MathSciNet  Google Scholar 

  5. Jacquin, L., Imoussaten, A., Trousset, F., Montmain, J., Perrin, D.: Evidential classification of incomplete data via imprecise relabelling: application to plastic sorting. In: Ben Amor, N., Quost, B., Theobald, M. (eds.) SUM 2019. LNCS (LNAI), vol. 11940, pp. 122–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35514-2_10

    Chapter  Google Scholar 

  6. Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1219–1228 (2018)

    Google Scholar 

  7. Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)

    Google Scholar 

  8. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123, 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  9. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  10. Lyu, X., Gao, L., Guo, Y., Zhao, Z., Huang, H., Shen, H.T., Song, J.: Fine-grained predicates learning for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19467–19475 (2022)

    Google Scholar 

  11. Shafer, G.: A Mathematical Theory of Evidence, Princeton University Press, Princeton, vol. 42 (1976)

    Google Scholar 

  12. Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22, 31–72 (2011)

    Article  MathSciNet  Google Scholar 

  13. Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66(2), 191–234 (1994)

    Article  MathSciNet  Google Scholar 

  14. Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3716–3725 (2020)

    Google Scholar 

  15. Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)

    Article  Google Scholar 

  16. Yang, G., Zhang, J., Zhang, Y., Wu, B., Yang, Y.: Probabilistic modeling of semantic ambiguity for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12527–12536 (2021)

    Google Scholar 

  17. Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694 (2019)

    Google Scholar 

  18. Zhou, Y., Sun, S., Zhang, C., Li, Y., Ouyang, W.: Exploring the hierarchy in relation labels for scene graph generation (2020). arXiv preprint arXiv:2009.05834

Download references

Acknowledgement

This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucie Kunitomo-Jacquin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kunitomo-Jacquin, L., Fukuda, K. (2024). Evidential Representation Proposal for Predicate Classification Output Logits in Scene Graph Generation. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14734. Springer, Cham. https://doi.org/10.1007/978-3-031-60606-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60606-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60605-2

  • Online ISBN: 978-3-031-60606-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics