Iterative Visual Relationship Detection via Commonsense Knowledge Graph

Wan, Hai; Ou, Jialing; Wang, Baoyi; Du, Jianfeng; Pan, Jeff Z.; Zeng, Juan

doi:10.1007/978-3-030-41407-8_14

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12032))

Included in the following conference series:

Joint International Semantic Technology Conference

1099 Accesses
2 Citations

Abstract

Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Throughout this paper, we identify that the predicate in visual relationship detection is the relation in scene graph.

References

Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part V. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24
Chapter Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of International Conference on Neural Information Processing Systems (NIPS2013), pp. 2787–2795 (2013)
Google Scholar
Chen, L., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of CVPR, 2016, pp. 3640–3649 (2016). https://doi.org/10.1109/CVPR.2016.396
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). http://arxiv.org/abs/1412.3555
Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: Proceedings of CVPR, 2017, pp. 3298–3308 (2017). https://doi.org/10.1109/CVPR.2017.352
Dong, L., Wei, F., Zhou, M., Xu, K.: Question answering over freebase with multi-column convolutional neural networks. In: Proceedings of ACL, 2015, pp. 260–269 (2015). http://aclweb.org/anthology/P/P15/P15-1026.pdf
Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of CVPR, pp. 3668–3678 (2015). http://dx.doi.org/10.1109/CVPR.2015.7298990
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017). https://doi.org/10.1007/s11263-016-0981-7
Article MathSciNet Google Scholar
Liang, K., Guo, Y., Chang, H., Chen, X.: Visual relationship detection with deep structural ranking. In: Proceedings of AAAI, 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16491
Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: Proceedings of CVPR, 2017, pp. 4408–4417 (2017). https://doi.org/10.1109/CVPR.2017.469
Liao, W., Lin, S., Rosenhahn, B., Yang, M.Y.: Natural language guided visual relationship detection. CoRR abs/1711.06032 (2017). http://arxiv.org/abs/1711.06032
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51
Chapter Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, 2015, pp. 91–99 (2015). http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
Wan, H., Luo, Y., Peng, B., Zheng, W.: Representation learning for scene graph completion via jointly structural and visual embedding. In: Proceedings of IJCAI, 2018, pp. 949–956 (2018). https://doi.org/10.24963/ijcai.2018/132
Yu, R., Li, A., Morariu, V.I., Davis, L.S.: Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of ICCV, 2017, pp. 1068–1076 (2017). https://doi.org/10.1109/ICCV.2017.121
Zhang, H., Kyaw, Z., Chang, S., Chua, T.: Visual translation embedding network for visual relation detection. In: Proceedings of CVPR, 2017, pp. 3107–3115 (2017). https://doi.org/10.1109/CVPR.2017.331
Zhang, H., Kyaw, Z., Yu, J., Chang, S.: PPR-FCN: weakly supervised visual relation detection via parallel pairwise R-FCN. In: Proceedings of IEEE, 2017, pp. 4243–4251 (2017). http://doi.ieeecomputersociety.org/10.1109/ICCV.2017.454
Zhu, Y., Jiang, S.: Deep structured learning for visual relationship detection. In: Proceedings of AAAI, 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16475

Download references

Acknowledgment

This paper was supported by the National Natural Science Foundation of China (No. 61375056, 61876204, 61976232, and 51978675), Guangdong Province Natural Science Foundation (No. 2017A070706010 (soft science), 2018A030313086), All-China Federation of Returned Overseas Chinese Research Project (17BZQK216), Science and Technology Program of Guangzhou (No. 201804010496, 201804010435).

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Hai Wan, Jialing Ou & Baoyi Wang
School of Information Science and Technology/School of Cyber Security, Guangdong University of Foreign Studies, Guangzhou, China
Jianfeng Du
Department of Computing Science, The University of Aberdeen, Aberdeen, UK
Jeff Z. Pan
School of Geography and Planning, Sun Yat-sen University, Guangzhou, China
Juan Zeng

Authors

Hai Wan
View author publications
You can also search for this author in PubMed Google Scholar
Jialing Ou
View author publications
You can also search for this author in PubMed Google Scholar
Baoyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Du
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Z. Pan
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jianfeng Du or Juan Zeng .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Bari, Bari, Italy
Francesca Alessandra Lisi
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Guohui Xiao
Imperial College London, London, UK
Elena Botoeva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wan, H., Ou, J., Wang, B., Du, J., Pan, J.Z., Zeng, J. (2020). Iterative Visual Relationship Detection via Commonsense Knowledge Graph. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science(), vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-41407-8_14
Published: 14 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41406-1
Online ISBN: 978-3-030-41407-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics