skip to main content
10.1145/3603273.3635669acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaaiaConference Proceedingsconference-collections
research-article

Transformer-based Deep Embedding Network for Scene Graph Generation

Published: 09 January 2024 Publication History

Abstract

Due to the complexity and variability of the relationships between objects, it becomes very difficult to detect the relationships between them. Scene graph generation (SGG) has been receiving attention as a challenging task in computer vision. Most of the existing scene graph generation methods use two-stage or point-based single-stage methods, but these methods usually suffer from excessive time complexity or poor design assumptions. In this paper, we adopt a single-stage generation method inspired by the transformer. In this, the main body still uses Convolutional Neural Network (CNN) for image feature extraction, and then the extracted features are given to the transformer decoder for encoding and decoding, and then processed to obtain the scene graph. The work in this paper lies in 1) adding a predicate generator to the traditional transformer decoder, and 2) evaluating it on some improved visual genome-based datasets, and the results show that the method improves the SGG's relationship recognition ability.

References

[1]
Johnson J, Krishna R, Stark M, Image retrieval using scene graphs[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 2015.
[2]
Zhou J, Cui G, Hu S, Graph neural networks: A review of methods and applications[J]. AI Open, 2020, 1:57-81.
[3]
Qin Z, Tao X, Lu J, Semantic communications: Principles and challenges[J]. arXiv preprint arXiv:2201.01389, 2021.
[4]
K. Nguyen, S. Tripathi, B. Du, T. Guha, and T. Q. Nguyen, "In defense of scene graphs for image captioning," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 1407-1416.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213-229. Springer, 2020.
[6]
Hengyue Liu, Ning Yan, Masood Mortazavi, and Bir Bhanu. Fully convolutional scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11546-11556, 2021.
[7]
Qi Dong, Zhuowen Tu, Haofu Liao, Yuting Zhang, Vijay Mahadevan, and Stefano Soatto. Visual relationship detection using part-and-sum transformers with composite queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3550-3559, 2021.
[8]
Wenbin Wang, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2020. Sketching image gist: Human-mimetic hierarchical scene graph generation. In Proceedings of the European Conference on Computer Vision. Springer, 222-239.
[9]
Jianming Lv, Qinzhe Xiao, and Jiajie Zhong. 2020. AVR: Attention based Salient Visual Relationship Detection. arXiv preprint arXiv:2003.07012 (2020).
[10]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, . 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123, 1 (2017), 32-73.
[11]
R. Zellers, M. Yatskar, S. Thomson and Y. Choi, "Neural Motifs: Scene Graph Parsing with Global Context," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5831-5840.
[12]
R. Li, S. Zhang, B. Wan and X. He, "Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11104-11114.
[13]
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5410-5419.
[14]
Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, and Xiaogang Wang. 2017. Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE International Conference on Computer Vision. 1261-1270.
[15]
Xin Lin, Changxing Ding, Jinquan Zeng, and Dacheng Tao. 2020. Gps-net: Graph property sensing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3746-3753.
[16]
Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. Graph r-cnn for scene graph generation. In Proceedings of the European conference on computer vision (ECCV), pages 670-685, 2018.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAIA '23: Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications
November 2023
406 pages
ISBN:9798400708268
DOI:10.1145/3603273
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Computer vision
  2. Scene graph generation
  3. Transformer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the National Natural Science Foundation of China

Conference

AAIA 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 20
    Total Downloads
  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media