skip to main content
10.1145/3343031.3356080acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Relation Understanding in Videos

Published: 15 October 2019 Publication History

Abstract

In this paper, we present our solutions to the grand challenge task "Relation Understanding in Videos" in ACM Multimedia 2019. The challenge task aims to detect instances of target visual relations in a video, where a visual relation instance is represented by a relation triplet <subject, predicate, object> with the trajectories of the subject and object. It seems that it is similar to the image relation detection task with the input changed from images to videos. However, video relation detection requires a much more complex pipeline which not only needs to detect objects in each frame, but also track them in the temporal direction. In this challenge, we follow the basic pipeline structure which consists of 3 main separate components: an object detector, an object tracker and a relation predictor. Based on our analysis, there exist data unbalance and label missing problems in the VidOR dataset. We exploit two simple but effective methods to alleviate these problems. We also use trajectory Non-Maximum Suppression and a sliding window method to address the redundancy of trajectory proposals. Experimental results on the challenge task demonstrate that by applying these proposed approaches over the state-of-the-art relation prediction model, the video relation detection performance can be improved with the precision@1 and precision@5 reaching 0.3305 and 0.3507 respectively.

References

[1]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[2]
Bo Dai, Yuqi Zhang, and Dahua Lin. 2017. Detecting visual relationships with deep relational networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3076--3086.
[3]
Kongming Liang, Yuhong Guo, Hong Chang, and Xilin Chen. 2018. Visual relationship detection with deep structural ranking. In Thirty-Second AAAI Conference on Artificial Intelligence.
[4]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[5]
Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European Conference on Computer Vision. Springer, 852--869.
[6]
Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. ACM, 279--287.
[7]
Xindi Shang, Tongwei Ren, Jingfan Guo, Hanwang Zhang, and Tat-Seng Chua. 2017. [ACM Press the 2017 ACM - Mountain View, California, USA (2017.10.23-2017.10.27)] Proceedings of the 2017 ACM on Multimedia Conference - MM '17 - Video Visual Relation Detection. 1300--1308.
[8]
Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Singleshot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4203--4212.

Cited By

View all
  • (2024)Rethinking the Architecture Design for Efficient Generic Event Boundary DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681513(1215-1224)Online publication date: 28-Oct-2024
  • (2024)Video Visual Relation Detection Based on Trajectory Fusion2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650663(1-9)Online publication date: 30-Jun-2024
  • (2024)Scene Graph Generation: A comprehensive surveyNeurocomputing10.1016/j.neucom.2023.127052566(127052)Online publication date: Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature representation
  2. tracking
  3. video relation detection

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Plan
  • National Natural Science Foundation of China
  • Beijing Natural Science Foundation

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Rethinking the Architecture Design for Efficient Generic Event Boundary DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681513(1215-1224)Online publication date: 28-Oct-2024
  • (2024)Video Visual Relation Detection Based on Trajectory Fusion2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650663(1-9)Online publication date: 30-Jun-2024
  • (2024)Scene Graph Generation: A comprehensive surveyNeurocomputing10.1016/j.neucom.2023.127052566(127052)Online publication date: Jan-2024
  • (2023)Video Visual Relation Detection With Contextual Knowledge EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327032835:12(13083-13095)Online publication date: 1-Dec-2023
  • (2023)Concept-Enhanced Relation Network for Video Visual Relation InferenceIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.322042633:5(2233-2244)Online publication date: May-2023
  • (2022)VRDFormer: End-to-End Video Visual Relation Detection with Transformers2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01827(18814-18824)Online publication date: Jun-2022
  • (2022)Video Visual Relation Detection via 3D Convolutional Neural NetworkIEEE Access10.1109/ACCESS.2022.315442310(23748-23756)Online publication date: 2022
  • (2021)Tracklet Pair Proposal and Context Reasoning for Video Scene Graph GenerationSensors10.3390/s2109316421:9(3164)Online publication date: 2-May-2021
  • (2021)VidVRD 2021Proceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3479232(4779-4783)Online publication date: 17-Oct-2021
  • (2021)Video Relation Detection via Tracklet based Visual TransformerProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3479231(4833-4837)Online publication date: 17-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media