Skip to main content
Log in

Generating comprehensive scene graphs with integrated multiple attribute detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Scene graphs present the semantic highlights of the underlying image in directed graph form. Their automated generation is often restricted to detecting multiple relations for objects. The diverse attributes of the objects are neglected in the process. We propose a Multiple Attribute Detector that can capture structured attribute information of an object, i.e. attribute type, value in the form of a triplet. The module is capable of generating multiple such triplets for every detected object in the scene. It can be integrated with existing scene graph generation frameworks without altering relation detection mechanism to yield comprehensive scene graphs. We have also created new datasets for this purpose that include attribute type-value data per object.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Aditya, S., Yang, Y., Baral, C., et al.: From images to sentences through scene description graphs using commonsense reasoning and knowledge. arXiv:1511.03292v1 [cs.CV] (2015)

  2. Aditya, S., Yang, Y., Baral, C., et al.: Image understanding using vision and reasoning through scene description graph. Comput. Vis. Image Underst. 3, 33–45 (2018)

    Article  Google Scholar 

  3. Chen, T., Yu, W., Chen, R., et al.: Knowledge-embedded routing network for scene graph generation. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  4. Farhadi, A,. Endres, I., Hoiem, D., et al.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

  5. Gu, J., Zhao, H., Lin, Z., et al.: Scene graph generation with external knowledge and image reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  6. Hudson, D., Manning, C.D.: Gqa–a new dataset for real-world visual reasoning and compositional question answering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6700–6709 (2019)

  7. Isola, P., Lim, J.J., Adelson, E.H.: Discovering states and transformations in image collections. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

  8. Johnson, J., Krishna, R., Stark, M., et al.: Image retrieval using scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)

  9. Johnson, J., Gupta, A., Fei-Fei, L .: Image generation from scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

  10. Klawonn, M., Heim, E.: Generating triples with adversarial networks for scene graph construction. In: Association for the Advancement of Artificial Intelligence (2018)

  11. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014)

  12. Li, Y., Ouyang, W., Zhou, B., et al.: Scene graph generation from objects, phrases and region captions. In: International Conference on Computer Vision (2017)

  13. Li, Y., Ouyang, W., Zhou, B., et al.: Factorizable net: An efficient subgraph-based framework for scene graph generation. In: European Conference on Computer Vision (2018)

  14. Li, Y.L., Xu, Y., Mao, X., et al.: Symmetry and group in attribute-object compositions. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)

  15. Lin, X., Ding, C., Zeng, J., et al.: (2020). Gps-net: Graph property sensing network for scene graph generation. In: IEEE Conference on Computer Vision and Pattern Recognition

  16. Liu, H., Yan, N., Mortazavi, M., et al.: Fully convolutional scene graph generation. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

  17. Lu, C., Krishna, R., Bernstein, M., et al.: Visual relationship detection with language priors. In: European Conference on Computer Vision (2016)

  18. Mancini, M., Naeem, M.F., Xian, Y., et al.: Open world compositional zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

  19. Misra, I., Gupta, A., Hebert, M .:From red wine to red tomato - composition with context. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  20. Naeem M.F, Xian Y, Tombari F, et al Learning graph embeddings for compositional zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

  21. Nagarajan, T., Grauman, K.: Attributes as operators. In: European Conference on Computer Vision (2018)

  22. Nan, Z., Liu, Y., Zheng, N., et al Recognizing unseen attribute-object pair with generative model. In: Association for the Advancement of Artificial Intelligence (2019)

  23. Parikh, D., Grauman, K.: Relative attributes. In: International Conference on Computer Vision (2011)

  24. Purushwalkam, S., Nickel, M, Gupta, A., et al. Task-driven modular networks for zero-shot compositional learning. In: International Conference on Computer Vision (2019)

  25. Krishna, R., YZ, Groth, O., Johnson, J., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123 (2017)

  26. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)

    Article  Google Scholar 

  27. Schuster, S., Krishna, R., Chang, A., et al.: Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: ACL Workshop on Vision and Language, pp. 70–80 (2015)

  28. Shi, J., Zhang, H., Li, J.: Explainable and explicit visual reasoning over scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  29. Tang, K.: Scene graph benchmark in pytorch. https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch (2020)

  30. Tang, K., Zhang, H., Wu, B., et al.: Learning to compose dynamic tree structures for visual contexts. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  31. Tang, K., Niu, Y., Huang, J., et al .: Unbiased scene graph generation from biased training. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)

  32. Teney, D., Liu, L., van Den Hengel, A.: Graph-structured representations for visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  33. Wan, H., Luo, Y., Peng, B., et al.: Representation learning for scene graph completion via jointly structural and visual embedding. In: International Joint Conference on Artificial Intelligence (2018)

  34. Wei, K., Yang, M., Wang, H., et al.: Adversarial fine-grained composition learning for unseen attribute-object recognition. In: International Conference on Computer Vision (2019)

  35. Woo, S., Kim, D., Cho, D., et al.: Linknet: Relational embedding for scene graph. In: Neural Information Processing Systems (2018)

  36. Xu, D., Zhu, Y., Choy, C.B., et al.: Scene graph generation by iterative message passing. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  37. Yang, J., Lu, J., Lee, S., et al.: (2018) Graph r-cnn for scene graph generation. In: European Conference on Computer Vision (2017)

  38. Yang, X., Tang, K., Zhang, H., et al.: Auto-encoding scene graphs for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  39. Yao, T., Pan, Y., Li, Y., et al Exploring visual relationship for image captioning. In: European Conference on Computer Vision (2018)

  40. Yu, A., Grauman K.: Semantic jitter: Dense supervision for visual comparisons via synthetic images. In: International Conference on Computer Vision (2017)

  41. Zareian, A., Karaman, S., et al.: Bridging knowledge graphs to generate scene graphs. In: European Conference on Computer Vision (2020a)

  42. Zareian, A., Wang, Z., You, H., et al.: Learning visual commonsense for robust scene graph generation. In: European Conference on Computer Vision (2020b)

  43. Zellers, R., Yatskar, M., Thomson, S., et al.: Neural motifs-scene graph parsing with global context. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charulata Patil.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patil, C., Abhyankar, A. Generating comprehensive scene graphs with integrated multiple attribute detection. Machine Vision and Applications 34, 11 (2023). https://doi.org/10.1007/s00138-022-01361-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-022-01361-3

Keywords

Navigation