Abstract
A scene graph consists of a collection of triplets < subject, predicate, object > for describing an image content. One challenging problem in Scene Graph Generation (SGG) is that annotators tend to give poorly relevant predicates, which causes a bias toward less informative triplet predictions. This paper focuses on predicate classification task. We question the information processing that leads to the deduction of poorly informative predicates in current models. We argue that the set of possible predicates should not be regarded as a probability space notably because the predicates granularity varies, like on and \(sitting \; on\). We suggest an alternative representation of the information in the Dempster-Shafer framework using a goal-oriented constructed hierarchy. Thanks to this more trustworthy representation, we propose a flexible decision-making procedure that allows us to play with the predicted predicate level of granularity. Our experiments, carried out using scores estimated by an existing transformer-based scene graph generation model, show that our method helps reduce the long tail problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aditya, S., Yang, Y., Baral, C., Aloimonos, Y., Fermüller, C.: Image understanding using vision and reasoning through scene description graph. Comput. Vis. Image Underst. 173, 33–45 (2018)
Cong, Y., Yang, M.Y., Rosenhahn, B.: RelTR: relation transformer for scene graph generation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 11169–11183 (2023)
Ghosh, S., Burachas, G., Ray, A., Ziskind, A.: Generating natural language explanations for visual question answering using scene graphs and visual attention (2019). arXiv preprint arXiv:1902.05715
Imoussaten, A., Jacquin, L.: Cautious classification based on belief functions theory and imprecise relabelling. Int. J. Approximate Reasoning 142, 130–146 (2022)
Jacquin, L., Imoussaten, A., Trousset, F., Montmain, J., Perrin, D.: Evidential classification of incomplete data via imprecise relabelling: application to plastic sorting. In: Ben Amor, N., Quost, B., Theobald, M. (eds.) SUM 2019. LNCS (LNAI), vol. 11940, pp. 122–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35514-2_10
Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1219–1228 (2018)
Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123, 32–73 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lyu, X., Gao, L., Guo, Y., Zhao, Z., Huang, H., Shen, H.T., Song, J.: Fine-grained predicates learning for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19467–19475 (2022)
Shafer, G.: A Mathematical Theory of Evidence, Princeton University Press, Princeton, vol. 42 (1976)
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22, 31–72 (2011)
Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66(2), 191–234 (1994)
Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3716–3725 (2020)
Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
Yang, G., Zhang, J., Zhang, Y., Wu, B., Yang, Y.: Probabilistic modeling of semantic ambiguity for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12527–12536 (2021)
Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694 (2019)
Zhou, Y., Sun, S., Zhang, C., Li, Y., Ouyang, W.: Exploring the hierarchy in relation labels for scene graph generation (2020). arXiv preprint arXiv:2009.05834
Acknowledgement
This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kunitomo-Jacquin, L., Fukuda, K. (2024). Evidential Representation Proposal for Predicate Classification Output Logits in Scene Graph Generation. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14734. Springer, Cham. https://doi.org/10.1007/978-3-031-60606-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-60606-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60605-2
Online ISBN: 978-3-031-60606-9
eBook Packages: Computer ScienceComputer Science (R0)