Modelling relations with prototypes for visual relation detection

Plesse, François; Ginsca, Alexandru; Delezoide, Bertrand; Prêteux, Françoise

doi:10.1007/s11042-020-09001-6

Modelling relations with prototypes for visual relation detection

Published: 04 August 2020

Volume 80, pages 22465–22486, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

François Plesse ORCID: orcid.org/0000-0002-8631-3758^1,2,
Alexandru Ginsca¹,
Bertrand Delezoide¹ &
…
Françoise Prêteux²

262 Accesses
Explore all metrics

Abstract

Relations between objects drive our understanding of images. Modelling them poses several challenges due to the combinatorial nature of the problem and the complex structure of natural language. This paper tackles the task of predicting relationships in the form of (subject, relation, object) triplets from still images. To address these issues, we propose a framework for learning relation prototypes that aims to capture the complex nature of relation distributions. Concurrently, a network is trained to define a space in which relationship triplets with similar spatial layouts, interacting objects and relations are clustered together. Finally, the network is compared to two models explicitly tackling the problem of synonymy among relations. For this, two well known scene-graph labelling benchmarks are used for training and testing: VRD and Visual Genome. Prediction of relations based on distance to prototype provides a significant increase in the diversity of predicted relations, improving the average relation recall from 40.3% to 41.7% on the first and 31.3% to 35.4% on the second.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Transformer networks with adaptive inference for scene graph generation

Article 10 August 2022

References

Chao YW, Wang Z, He Y, Wang J, Deng J (2015) HICO: A benchmark for recognizing human-object interactions in images. In: ICCV. https://doi.org/10.1109/ICCV.2015.122
Chao YW, Wang Z, Mihalcea R, Deng J (2015) Mining semantic affordances of visual object categories. In: CVPR. https://doi.org/10.1109/CVPR.2015.7299054
Cui Y, Zhou F, Lin Y, Belongie S (2016) Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop. In: CVPR
Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks. In: CVPR. https://doi.org/10.1109/CVPR.2017.352
de Boer M, Schutte K, Kraaij W (2016) Knowledge based query expansion in complex multimedia event detection. Multimed Tools Appl 75(15):9025–9043. https://doi.org/10.1007/s11042-015-2757-4
Article Google Scholar
Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-Scale Object classification using label relation graphs. In: European conference on computer vision
Fang Y, Kuan K, Lin J, Tan C, Chandrasekhar V (2017) Object detection meets knowledge graphs. IJCAI, pp 1661–1667
Fellbaum C (1998) Wordnet: An Electronic Lexical database, vol 71. Bradford Books. https://doi.org/10.1139/h11-025
Girshick R (2015) Fast r-CNN. In: ICCV
Gkioxari G, Girshick R, Dollár P, He K (2018) Detecting and recognizing Human-Object interactions. In: CVPR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Herzig R, Raboh M, Chechik G, Berant J, Globerson A (2018) Mapping images to scene graphs with Permutation-Invariant structured prediction. In: NIPS
Hu Z, Ma X, Liu Z, Hovy E, Xing E (2016) Harnessing deep neural networks with logic rules. In: ACL. https://doi.org/10.18653/v1/P16-1228
Johnson J, Douze M, Jégou H Billion-scale similarity search with GPUs
Kaiser L, Nachum O, Roy A, Bengio S (2017) Learning to remember rare events. In: ICLR
Koch G, Zemel R, Salakhutdinov R Siamese Neural Networks for One-shot Image Recognition. Technical report. https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein M, Fei-Fei L (2016) Visual genome: Connecting language and vision using crowdsourced dense image annotations. https://doi.org/10.1007/s11263-016-0981-7
Li Y, Ouyang W, Wang X, Tang X (2017) Vip-CNN: Visual Phrase Guided Convolutional Neural Network. In: CVPR. https://doi.org/10.1109/CVPR.2017.766
Liang K, Guo Y, Chang H, Chen X (2018) Visual relationship detection with deep structural ranking. In: AAAI
Liang X, Lee L, Xing EP (2017) Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection. In: CVPR. 10.1109/CVPR.2017.469. arXiv:1703.03054
Long Y, Liu L, Shao L, Shen F, Ding G, Han J (2017) From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: CVPR. https://doi.org/10.1109/CVPR.2017.653
Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: ECCV. https://doi.org/10.1007/978-3-319-46448-0_51
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Fifth berkeley symposium on mathematical statistics and probability
Marino K, Salakhutdinov R, Gupta A (2017) The more you know: Using knowledge graphs for image classification. In: CVPR. https://doi.org/10.1109/CVPR.2017.10
Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: ICLR. https://doi.org/10.1162/153244303322533223
Newell A, Deng J (2017) Pixels to graphs by associative embedding. In: NIPS
Peyre J, Laptev I, Schmid C, Sivic J (2017) Weakly-supervised learning of visual relations. In: ICCV
Plesse F, Ginsca A, Delezoide B, Prêteux F (2020) Focusing visual relation detection on relevant relations with prior potentials. In: WACV
Plesse F, Ginsca A, Delezoide B, Prêteux F (2018) Visual relationship detection based on guided proposals and semantic knowledge distillation. In: ICME
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, Real-Time Object Detection. In: CVPR. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: Towards Real-Time object detection with region proposal networks. In: NIPS
Sarullo A, Mu T (2019) On Class Imbalance and Background Filtering in Visual Relationship Detection
Schroff F, Philbin J (2015) Facenet: A Unified Embedding for Face Recognition and Clustering. In: CVPR
Simonyan K, Zisserman A (2015) Very deep convolutional networks for Large-Scale image recognition. In: ICLR. https://doi.org/10.1016/j.infsof.2008.09.005
Speer R, Havasi C (2012) Representing General Relational Knowledge in ConceptNet 5. In: LREC
Van Der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. Journal of Machine Learning Research 9:2579–2605
MATH Google Scholar
Vinyals O, Deepmind G, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: NIPS
Wang X, Ye Y, Gupta A (2018) Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs. In: CVPR
Weinberger KQ, Blitzer J, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification journal of machine learning research
Woo S, Kim D, Daejeon K, Cho DE, So Kweon IE (2018) LinkNet: Relational Embedding for Scene Graph. In: NIPS. arXiv:1811.06410.pdf
Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: CVPR. https://doi.org/10.1109/CVPR.2017.330
Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J, Loy CC (2018) Zoom-Net: Mining Deep feature interactions for visual relationship recognition. In: ECCV
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: ICCV
Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: Scene graph parsing with global context. In: CVPR
Zhu Y, Jiang S, Li X (2017) Visual relationship detection with object spatial distribution. In: ICME

Download references

Author information

Authors and Affiliations

CEA, LIST, F-91191, Gif-sur-Yvette, France
François Plesse, Alexandru Ginsca & Bertrand Delezoide
CERMICS, Ecole des Ponts, Champs-sur-Marne, France
François Plesse & Françoise Prêteux

Authors

François Plesse
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Ginsca
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Delezoide
View author publications
You can also search for this author in PubMed Google Scholar
Françoise Prêteux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to François Plesse.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Plesse, F., Ginsca, A., Delezoide, B. et al. Modelling relations with prototypes for visual relation detection. Multimed Tools Appl 80, 22465–22486 (2021). https://doi.org/10.1007/s11042-020-09001-6

Download citation

Received: 20 March 2019
Revised: 19 February 2020
Accepted: 01 May 2020
Published: 04 August 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-09001-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling relations with prototypes for visual relation detection

Abstract

Access this article

Similar content being viewed by others

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Transformer networks with adaptive inference for scene graph generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modelling relations with prototypes for visual relation detection

Abstract

Access this article

Similar content being viewed by others

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Transformer networks with adaptive inference for scene graph generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation