Visual relationship detection based on bidirectional recurrent neural network

Dai, Yibo; Wang, Chao; Dong, Jian; Sun, Changyin

doi:10.1007/s11042-019-7732-z

Visual relationship detection based on bidirectional recurrent neural network

Published: 14 May 2019

Volume 79, pages 35297–35313, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yibo Dai¹,
Chao Wang¹,
Jian Dong ORCID: orcid.org/0000-0002-1956-0900¹ &
…
Changyin Sun¹

342 Accesses
8 Citations
Explore all metrics

Abstract

Visual relationship detection is a task aiming at mining the information of interactions between the paired objects in the image, describing the image in the form of (subject − predicate − object). Most of the previous works regard it as a pure classification problem by taking the integrated triplets as the label of the image; however, the numerous combinations of objects and the diversity of predicates are the tough challenges for these studies. Hence, we propose a deep model based on a modified bidirectional recurrent neural network (BRNN) to classify object and predict predicate simultaneously. By using the BRNN, the hidden information of the relationship in the image is extracted and a feature-infusion method is proposed. Additionally, we improve the existing works by introducing a paired non-maximum suppression method. The experiments show that our approach is competitive with the state-of-the-art works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2

Exploiting Attention for Visual Relationship Detection

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Iterative Visual Relationship Detection via Commonsense Knowledge Graph

References

Agrawal A, Lu J, Antol S, Zitnick CL, Zitnick CL, Parikh D, Batra D (2017) Vqa: visual question answering. Int J Comput Vis 123(1):1–28
Article MathSciNet Google Scholar
Choi MJ, Lim JJ, Torralba A, Willsky AS (2010) Exploiting hierarchical context on a large database of object categories. In: Computer vision and pattern recognition, pp 129–136
Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks: 3298–3308
Desai C, Ramanan D, Fowlkes C (2010) Discriminative models for multi-class object layout. In: IEEE International conference on computer vision, pp 229–236
Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning. In: IEEE Conference on computer vision and pattern recognition, pp 3270–3277
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision, pp 15–29
Fidler S, Leonardis A (2007) Towards scalable representations of object categories: learning a hierarchy of parts. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07, pp 1–8
Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst 114(6):712–722
Article Google Scholar
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation: 580–587
Gould S, Rodgers J, Cohen D, Elidan G, Koller D (2008) Multi-class segmentation with relative location prior. Int J Comput Vis 80(3):300–316
Article Google Scholar
Gupta A, Kembhavi A, Davis LS (2009) Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans Pattern Anal Mach Intell 31(10):1775–1789
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition: 770–778
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Article MathSciNet Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105
Kumar MP, Koller D (2010) Efficiently selecting regions for scene understanding. In: Computer vision and pattern recognition, pp 3217–3224
Ladicky L, Russell C, Kohli P, Torr PHS (2010) Graph cut based inference with co-occurrence statistics. In: European conference on computer vision, pp 239–253
Li Y, Ouyang W, Wang X, Tang X (2017) Vip-cnn: visual phrase guided convolutional neural network: 7244–7253
Li Y, Ouyang W, Zhou B, Wang K, Wang X (2017) Scene graph generation from objects phrases and region captions
Liao W, Shuai L, Rosenhahn B, Yang MY (2017) Natural language guided visual relationship detection
Lu C, Krishna R, Bernstein M, Li FF (2016) Visual relationship detection with language priors: 852–869
Maji S, Bourdev L, Malik J (2011) Action recognition from a distributed representation of pose and appearance. In: Computer vision and pattern recognition, pp 3177–3184
Mensink T, Gavves E, Snoek CGM (2014) Costa: co-occurrence statistics for zero-shot classification. In: IEEE Conference on computer vision and pattern recognition, pp 2441–2448
Peyre J, Laptev I, Schmid C, Sivic J (2017) Weakly-supervised learning of visual relations: 5189–5198
Plummer BA, Mallya A, Cervantes CM, Hockenmaier J, Lazebnik S (2017) Phrase localization and visual relationship detection with comprehensive image-language cues: 1946–1955
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: IEEE International conference on computer vision, pp 1–8
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Rohrbach M, Qiu W, Titov I, Thater S, Pinkal M, Schiele B (2013) Translating video content to natural language descriptions. In: IEEE International conference on computer vision, pp 433–440
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sadeghi MA, Farhadi A (2011) Recognition using visual phrases. In: Computer vision and pattern recognition, pp 1745–1752
Salakhutdinov R, Torralba A, Tenenbaum J (2011) Learning to share visual appearance for multiclass object detection. In: Computer vision and pattern recognition, pp 1481–1488
Shelhamer E, Long J, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Google Scholar
Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their localization in images. In: Tenth IEEE international conference on computer vision, vol 1, pp 370–377
Socher R, Li FF (2010) Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: Computer vision and pattern recognition, pp 966–973
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4 inception-resnet and the impact of residual connections on learning
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356
Article Google Scholar
Xu D, Zhu Y, Choy CB, Li FF (2017) Scene graph generation by iterative message passing. In: IEEE Conference on computer vision and pattern recognition, pp 3097–3106
Yang Z, He X, Gao J, Deng L, Smola A (2016) Stacked attention networks for image question answering. In: IEEE Conference on computer vision and pattern recognition, pp 21–29
Yao B, Li FF (2010) Modeling mutual context of object and human pose in human-object interaction activities. In: Computer vision and pattern recognition, pp 17–24
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation: 1068–1076
Zhang H, Kyaw Z, Chang SF, Chua TS (2017) Visual translation embedding network for visual relation detection: 3107–3115

Download references

Acknowledgements

This project is partly supported by NSF of China (61773117, 61473086).

Author information

Authors and Affiliations

Key Lab of Measurement and Control of Complex Systems of Engineering, School of Automation, Southeast University, Nanjing, 210096, China
Yibo Dai, Chao Wang, Jian Dong & Changyin Sun

Authors

Yibo Dai
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Dong
View author publications
You can also search for this author in PubMed Google Scholar
Changyin Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Dong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, Y., Wang, C., Dong, J. et al. Visual relationship detection based on bidirectional recurrent neural network. Multimed Tools Appl 79, 35297–35313 (2020). https://doi.org/10.1007/s11042-019-7732-z

Download citation

Received: 10 December 2018
Revised: 30 March 2019
Accepted: 03 May 2019
Published: 14 May 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-019-7732-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Visual relationship detection based on bidirectional recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Exploiting Attention for Visual Relationship Detection

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Iterative Visual Relationship Detection via Commonsense Knowledge Graph

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual relationship detection based on bidirectional recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Exploiting Attention for Visual Relationship Detection

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Iterative Visual Relationship Detection via Commonsense Knowledge Graph

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation