research-article

Visual Relation of Interest Detection

Authors:

Fan Yu,

Haonan Wang,

Tongwei Ren,

Jinhui Tang,

Gangshan WuAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 1386 - 1394

https://doi.org/10.1145/3394171.3413566

Published: 12 October 2020 Publication History

Get Access

Abstract

In this paper, we propose a novel Visual Relation of Interest Detection (VROID) task, which aims to detect visual relations that are important for conveying the main content of an image, motivated from the intuition that not all correctly detected relations are really "interesting" in semantics and only a fraction of them really make sense for representing the image main content. Such relations are named Visual Relations of Interest (VROIs). VROID can be deemed as an evolution over the traditional Visual Relation Detection (VRD) task that tries to discover all visual relations in an image. We construct a new dataset to facilitate research on this new task, named ViROI, which contains 30,120 images each with VROIs annotated. Furthermore, we develop an Interest Propagation Network (IPNet) to solve VROID. IPNet contains a Panoptic Object Detection (POD) module, a Pair Interest Prediction (PaIP) module and a Predicate Interest Prediction (PrIP) module. The POD module extracts instances from the input image and also generates corresponding instance features and union features. The PaIP module then predicts the interest score of each instance pair while the PrIP module predicts that of each predicate for each instance pair. Then the interest scores of instance pairs are combined with those of the corresponding predicates as the final interest scores. All VROI candidates are sorted by final interest scores and the highest ones are taken as final results. We conduct extensive experiments to test effectiveness of our method, and the results show that IPNet achieves the best performance compared with the baselines on visual relation detection, scene graph generation and image captioning.

Supplementary Material

MP4 File (3394171.3413566.mp4)

We propose a new Visual Relation of Interest Detection task aiming to detect visual relations that are important for conveying the main content of an image, motivated from the intuition that not all correctly detected relations are really ?interesting? in semantics and only a fraction of them really make sense for representing the image main content. We construct a new dataset to facilitate research on this new task, named ViROI, which contains 30,120 images each with VROIs annotated. Furthermore, we develop an Interest Propagation Network to solve Visual Relation of Interest Detection. It contains a Panoptic Object Detection module, a Pair Interest Prediction module and a Predicate Interest Prediction module. We conduct extensive experiments to test effectiveness of our method, and the results show that our Interest Propagation Network achieves the best performance compared with the baselines on visual relation detection, scene graph generation and image captioning.

Download
127.70 MB

References

[1]

Jyoti Aneja, Aditya Deshpande, and Alexander G Schwing. 2018. Convolutional image captioning. In IEEE Conference on Computer Vision and Pattern Recognition. 5561--5570.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Boosting Scene Graph Generation with Visual Relation Saliency

Reproducibility Companion Paper: Visual Relation of Interest Detection

Instance of Interest Detection

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations