skip to main content
10.1145/3474085.3475684acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN

Authors Info & Claims
Published:17 October 2021Publication History

ABSTRACT

Recent years have witnessed the booming of online video platforms. Along this line, a graph to illustrate social relation among characters has been long expected to not only benefit the audiences for better understanding the story, but also support the fine-grained video analysis task in a semantic way. Unfortunately, though we humans could easily infer the social relations among characters, it is still an extremely challenging task for intelligent systems to automatically capture the social relation by absorbing multi-modal cues. Besides, they fail to describe the relations among multiple characters in a graph-generation perspective. To that end, inspired by the human inference ability on social relationship, we propose a novel Hierarchical- Cumulative Graph Convolutional Network (HC-GCN) to generate the social relation graph for multiple characters in the video. Specifically, we first integrate the short-term multi-modal cues, including visual, textual and audio information, to generate the frame-level graphs for part of characters via multimodal graph convolution technique. While dealing with the video-level aggregation task, we design an end-to-end framework to aggregate all frame-level subgraphs along the temporal trajectory, which results in a global video-level social graph with various social relationships among multiple characters. Extensive validations on two real-world large-scale datasets demonstrate the effectiveness of our proposed method compared with SOTA baselines.

Skip Supplemental Material Section

Supplemental Material

meeting_02.mp4

mp4

26.2 MB

References

  1. Hakan Bilen and Andrea Vedaldi. 2016. Weakly Supervised Deep Detection Networks. In CVPR, 2016. 2846--2854.Google ScholarGoogle Scholar
  2. Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, and Shih-Fu Chang. 2019 b. Counterfactual Critic Multi-Agent Training for Scene Graph Generation. In ICCV, 2019. 4612--4622.Google ScholarGoogle Scholar
  3. Yu Chen, Lingfei Wu, and Mohammed J Zaki. 2019 a. Reinforcement learning based graph-to-sequence model for natural question generation. In ICLR 2020.Google ScholarGoogle Scholar
  4. Yu Chen, Lingfei Wu, and Mohammed J Zaki. 2020. Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings. In Thirty-Fourth annual conference on Neural Information Processing Systems (NeurIPS 2020).Google ScholarGoogle Scholar
  5. Andrew C. Gallagher and Tsuhan Chen. 2009. Understanding images of groups of people. In CVPR, 2009. 256--263.Google ScholarGoogle Scholar
  6. Arushi Goel, Keng Teck Ma, and Cheston Tan. 2019. An End-To-End Network for Generating Social Relationship Graphs. In CVPR, 2019. 11186--11195.Google ScholarGoogle Scholar
  7. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR, 20166. IEEE Computer Society, 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qingqiu Huang, Wentao Liu, and Dahua Lin. 2018. Person Search in Videos with One Portrait Through Visual and Temporal Links. In ECCV, 2018. 437--454.Google ScholarGoogle Scholar
  10. Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Fei-Fei Li. 2015. Image retrieval using scene graphs. In CVPR, 2015. 3668--3678.Google ScholarGoogle Scholar
  11. Will Kay, Jo a o Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR, Vol. abs/1705.06950 (2017).Google ScholarGoogle Scholar
  12. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR, 2017.Google ScholarGoogle Scholar
  13. Anna Kukleva, Makarand Tapaswi, and Ivan Laptev. 2020. Learning Interactions and Relationships Between Movie Characters. In CVPR, 2020. 9846--9855.Google ScholarGoogle Scholar
  14. Jingjing Li, Ke Lu, Zi Huang, Lei Zhu, and Heng Tao Shen. 2019. Heterogeneous Domain Adaptation Through Progressive Alignment. IEEE Trans. Neural Networks Learn. Syst., Vol. 30, 5 (2019), 1381--1391.Google ScholarGoogle ScholarCross RefCross Ref
  15. Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2017. Dual-Glance Model for Deciphering Social Relationships. In ICCV, 2017. 2669--2678.Google ScholarGoogle Scholar
  16. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated Graph Sequence Neural Networks. In ICLR, 2016.Google ScholarGoogle Scholar
  17. Anan Liu, Yuting Su, Weizhi Nie, and Mohan S. Kankanhalli. 2017. Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 1 (2017), 102--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, and Tao Mei. 2019. Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning. In CVPR, 2019. 3566--3574.Google ScholarGoogle Scholar
  19. Jinna Lv, Wu Liu, Lili Zhou, Bin Wu, and Huadong Ma. 2018. Multi-stream Fusion Model for Social Relation Recognition from Videos. In MMM, 2018. 355--368.Google ScholarGoogle Scholar
  20. Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In EMNLP. Association for Computational Linguistics. https://arxiv.org/abs/2004.09813Google ScholarGoogle Scholar
  21. Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. [n.d.]. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NeurIPS, 2013. 926--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qianru Sun, Bernt Schiele, and Mario Fritz. 2017. A Domain Based Approach to Social Relation Recognition. In CVPR, 2017. 435--444.Google ScholarGoogle Scholar
  24. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR, 20188. 6450--6459.Google ScholarGoogle Scholar
  25. Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In ECCV, 2016. 20--36.Google ScholarGoogle Scholar
  26. Xiaolong Wang and Abhinav Gupta. 2018. Videos as Space-Time Region Graphs. In ECCV, 2018. 413--431.Google ScholarGoogle Scholar
  27. Zhouxia Wang, Tianshui Chen, Jimmy S. J. Ren, Weihao Yu, Hui Cheng, and Liang Lin. 2018a. Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In IJCAI, 2018. 1021--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhouxia Wang, Tianshui Chen, Jimmy S. J. Ren, Weihao Yu, Hui Cheng, and Liang Lin. 2018b. Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In IJCAI, 2018. 1021--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Likang Wu, Zhi Li, Hongke Zhao, Qi Liu, and Enhong Chen. 2021 a. Estimating Fund-Raising Performance for Start-up Projects from a Market Graph Perspective. Pattern Recognition (2021).Google ScholarGoogle Scholar
  30. Likang Wu, Zhi Li, Hongke Zhao, Qi Liu, Jun Wang, Mengdi Zhang, and Enhong Chen. 2021 b. Learning the Implicit Semantic Representation on Graph-Structured Data. DASFAA (2021).Google ScholarGoogle Scholar
  31. Hongtao Xie, Shancheng Fang, Zheng-Jun Zha, Yating Yang, Yan Li, and Yongdong Zhang. 2019. Convolutional Attention Networks for Scene Text Recognition. ACM Trans. Multim. Comput. Commun. Appl., Vol. 15, 1s (2019), 3:1--3:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Danfei Xu, Yuke Zhu, Christopher B. Choy, and Li Fei-Fei. 2017. Scene Graph Generation by Iterative Message Passing. In CVPR, 2017. 3097--3106.Google ScholarGoogle Scholar
  33. Ning Xu, An-An Liu, Yongkang Wong, Weizhi Nie, Yuting Su, and Mohan S. Kankanhalli. 2021 a. Scene Graph Inference via Multi-Scale Context Modeling. IEEE Trans. Circuits Syst. Video Technol., Vol. 31, 3 (2021), 1031--1041.Google ScholarGoogle ScholarCross RefCross Ref
  34. Tong Xu, Peilun Zhou, Linkang Hu, Xiangnan He, Yao Hu, and Enhong Chen. 2021 b. Socializing the Videos: A Multimodal Approach for Social Relation Recognition. In ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In ECCV, 2018. 690--706.Google ScholarGoogle Scholar
  36. Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing With Global Context. In CVPR, 2018. 5831--5840.Google ScholarGoogle Scholar
  37. Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, and Lubomir D. Bourdev. 2015b. Beyond frontal faces: Improving Person Recognition using multiple cues. In CVPR, 2015. 4804--4813.Google ScholarGoogle Scholar
  38. Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015a. Learning Social Relation Traits from Face Images. In ICCV, 2015. 3631--3639. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '21: Proceedings of the 29th ACM International Conference on Multimedia
        October 2021
        5796 pages
        ISBN:9781450386517
        DOI:10.1145/3474085

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader