Skip to main content

Iterative Visual Relationship Detection via Commonsense Knowledge Graph

  • Conference paper
  • First Online:
Semantic Technology (JIST 2019)

Abstract

Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Throughout this paper, we identify that the predicate in visual relationship detection is the relation in scene graph.

References

  1. Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part V. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24

    Chapter  Google Scholar 

  2. Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of International Conference on Neural Information Processing Systems (NIPS2013), pp. 2787–2795 (2013)

    Google Scholar 

  3. Chen, L., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of CVPR, 2016, pp. 3640–3649 (2016). https://doi.org/10.1109/CVPR.2016.396

  4. Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). http://arxiv.org/abs/1412.3555

  5. Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: Proceedings of CVPR, 2017, pp. 3298–3308 (2017). https://doi.org/10.1109/CVPR.2017.352

  6. Dong, L., Wei, F., Zhou, M., Xu, K.: Question answering over freebase with multi-column convolutional neural networks. In: Proceedings of ACL, 2015, pp. 260–269 (2015). http://aclweb.org/anthology/P/P15/P15-1026.pdf

  7. Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of CVPR, pp. 3668–3678 (2015). http://dx.doi.org/10.1109/CVPR.2015.7298990

  8. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017). https://doi.org/10.1007/s11263-016-0981-7

    Article  MathSciNet  Google Scholar 

  9. Liang, K., Guo, Y., Chang, H., Chen, X.: Visual relationship detection with deep structural ranking. In: Proceedings of AAAI, 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16491

  10. Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: Proceedings of CVPR, 2017, pp. 4408–4417 (2017). https://doi.org/10.1109/CVPR.2017.469

  11. Liao, W., Lin, S., Rosenhahn, B., Yang, M.Y.: Natural language guided visual relationship detection. CoRR abs/1711.06032 (2017). http://arxiv.org/abs/1711.06032

  12. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51

    Chapter  Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781

  14. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, 2015, pp. 91–99 (2015). http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks

  15. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  16. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556

  17. Wan, H., Luo, Y., Peng, B., Zheng, W.: Representation learning for scene graph completion via jointly structural and visual embedding. In: Proceedings of IJCAI, 2018, pp. 949–956 (2018). https://doi.org/10.24963/ijcai.2018/132

  18. Yu, R., Li, A., Morariu, V.I., Davis, L.S.: Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of ICCV, 2017, pp. 1068–1076 (2017). https://doi.org/10.1109/ICCV.2017.121

  19. Zhang, H., Kyaw, Z., Chang, S., Chua, T.: Visual translation embedding network for visual relation detection. In: Proceedings of CVPR, 2017, pp. 3107–3115 (2017). https://doi.org/10.1109/CVPR.2017.331

  20. Zhang, H., Kyaw, Z., Yu, J., Chang, S.: PPR-FCN: weakly supervised visual relation detection via parallel pairwise R-FCN. In: Proceedings of IEEE, 2017, pp. 4243–4251 (2017). http://doi.ieeecomputersociety.org/10.1109/ICCV.2017.454

  21. Zhu, Y., Jiang, S.: Deep structured learning for visual relationship detection. In: Proceedings of AAAI, 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16475

Download references

Acknowledgment

This paper was supported by the National Natural Science Foundation of China (No. 61375056, 61876204, 61976232, and 51978675), Guangdong Province Natural Science Foundation (No. 2017A070706010 (soft science), 2018A030313086), All-China Federation of Returned Overseas Chinese Research Project (17BZQK216), Science and Technology Program of Guangzhou (No. 201804010496, 201804010435).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jianfeng Du or Juan Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wan, H., Ou, J., Wang, B., Du, J., Pan, J.Z., Zeng, J. (2020). Iterative Visual Relationship Detection via Commonsense Knowledge Graph. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science(), vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41407-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41406-1

  • Online ISBN: 978-3-030-41407-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics