research-article

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN

Authors:
Shiwei Wu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Joya Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Tong Xu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Liyi Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Lingfei Wu

JD.COM Silicon Valley Research Center, Santa Clara, CA, USA

JD.COM Silicon Valley Research Center, Santa Clara, CA, USA
View Profile

,
Yao Hu

Alibaba Youku Cognitive and Intelligent Lab, Beijing, China

Alibaba Youku Cognitive and Intelligent Lab, Beijing, China
View Profile

,
Enhong Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 4716–4724https://doi.org/10.1145/3474085.3475684

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4716–4724

ABSTRACT

Recent years have witnessed the booming of online video platforms. Along this line, a graph to illustrate social relation among characters has been long expected to not only benefit the audiences for better understanding the story, but also support the fine-grained video analysis task in a semantic way. Unfortunately, though we humans could easily infer the social relations among characters, it is still an extremely challenging task for intelligent systems to automatically capture the social relation by absorbing multi-modal cues. Besides, they fail to describe the relations among multiple characters in a graph-generation perspective. To that end, inspired by the human inference ability on social relationship, we propose a novel Hierarchical- Cumulative Graph Convolutional Network (HC-GCN) to generate the social relation graph for multiple characters in the video. Specifically, we first integrate the short-term multi-modal cues, including visual, textual and audio information, to generate the frame-level graphs for part of characters via multimodal graph convolution technique. While dealing with the video-level aggregation task, we design an end-to-end framework to aggregate all frame-level subgraphs along the temporal trajectory, which results in a global video-level social graph with various social relationships among multiple characters. Extensive validations on two real-world large-scale datasets demonstrate the effectiveness of our proposed method compared with SOTA baselines.

Supplemental Material

meeting_02.mp4

mp4

26.2 MB

Download

References

Hakan Bilen and Andrea Vedaldi. 2016. Weakly Supervised Deep Detection Networks. In CVPR, 2016. 2846--2854.Google Scholar
Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, and Shih-Fu Chang. 2019 b. Counterfactual Critic Multi-Agent Training for Scene Graph Generation. In ICCV, 2019. 4612--4622.Google Scholar
Yu Chen, Lingfei Wu, and Mohammed J Zaki. 2019 a. Reinforcement learning based graph-to-sequence model for natural question generation. In ICLR 2020.Google Scholar
Yu Chen, Lingfei Wu, and Mohammed J Zaki. 2020. Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings. In Thirty-Fourth annual conference on Neural Information Processing Systems (NeurIPS 2020).Google Scholar
Andrew C. Gallagher and Tsuhan Chen. 2009. Understanding images of groups of people. In CVPR, 2009. 256--263.Google Scholar
Arushi Goel, Keng Teck Ma, and Cheston Tan. 2019. An End-To-End Network for Generating Social Relationship Graphs. In CVPR, 2019. 11186--11195.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR, 20166. IEEE Computer Society, 770--778.Google ScholarCross Ref
Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Qingqiu Huang, Wentao Liu, and Dahua Lin. 2018. Person Search in Videos with One Portrait Through Visual and Temporal Links. In ECCV, 2018. 437--454.Google Scholar
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Fei-Fei Li. 2015. Image retrieval using scene graphs. In CVPR, 2015. 3668--3678.Google Scholar
Will Kay, Jo a o Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR, Vol. abs/1705.06950 (2017).Google Scholar
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR, 2017.Google Scholar
Anna Kukleva, Makarand Tapaswi, and Ivan Laptev. 2020. Learning Interactions and Relationships Between Movie Characters. In CVPR, 2020. 9846--9855.Google Scholar
Jingjing Li, Ke Lu, Zi Huang, Lei Zhu, and Heng Tao Shen. 2019. Heterogeneous Domain Adaptation Through Progressive Alignment. IEEE Trans. Neural Networks Learn. Syst., Vol. 30, 5 (2019), 1381--1391.Google ScholarCross Ref
Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2017. Dual-Glance Model for Deciphering Social Relationships. In ICCV, 2017. 2669--2678.Google Scholar
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated Graph Sequence Neural Networks. In ICLR, 2016.Google Scholar
Anan Liu, Yuting Su, Weizhi Nie, and Mohan S. Kankanhalli. 2017. Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 1 (2017), 102--114. Google ScholarDigital Library
Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, and Tao Mei. 2019. Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning. In CVPR, 2019. 3566--3574.Google Scholar
Jinna Lv, Wu Liu, Lili Zhou, Bin Wu, and Huadong Ma. 2018. Multi-stream Fusion Model for Social Relation Recognition from Videos. In MMM, 2018. 355--368.Google Scholar
Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In EMNLP. Association for Computational Linguistics. https://arxiv.org/abs/2004.09813Google Scholar
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. [n.d.]. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS, 2015. Google ScholarDigital Library
Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NeurIPS, 2013. 926--934. Google ScholarDigital Library
Qianru Sun, Bernt Schiele, and Mario Fritz. 2017. A Domain Based Approach to Social Relation Recognition. In CVPR, 2017. 435--444.Google Scholar
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR, 20188. 6450--6459.Google Scholar
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In ECCV, 2016. 20--36.Google Scholar
Xiaolong Wang and Abhinav Gupta. 2018. Videos as Space-Time Region Graphs. In ECCV, 2018. 413--431.Google Scholar
Zhouxia Wang, Tianshui Chen, Jimmy S. J. Ren, Weihao Yu, Hui Cheng, and Liang Lin. 2018a. Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In IJCAI, 2018. 1021--1028. Google ScholarDigital Library
Zhouxia Wang, Tianshui Chen, Jimmy S. J. Ren, Weihao Yu, Hui Cheng, and Liang Lin. 2018b. Deep Reasoning with Knowledge Graph for Social Relationship Understanding. In IJCAI, 2018. 1021--1028. Google ScholarDigital Library
Likang Wu, Zhi Li, Hongke Zhao, Qi Liu, and Enhong Chen. 2021 a. Estimating Fund-Raising Performance for Start-up Projects from a Market Graph Perspective. Pattern Recognition (2021).Google Scholar
Likang Wu, Zhi Li, Hongke Zhao, Qi Liu, Jun Wang, Mengdi Zhang, and Enhong Chen. 2021 b. Learning the Implicit Semantic Representation on Graph-Structured Data. DASFAA (2021).Google Scholar
Hongtao Xie, Shancheng Fang, Zheng-Jun Zha, Yating Yang, Yan Li, and Yongdong Zhang. 2019. Convolutional Attention Networks for Scene Text Recognition. ACM Trans. Multim. Comput. Commun. Appl., Vol. 15, 1s (2019), 3:1--3:17. Google ScholarDigital Library
Danfei Xu, Yuke Zhu, Christopher B. Choy, and Li Fei-Fei. 2017. Scene Graph Generation by Iterative Message Passing. In CVPR, 2017. 3097--3106.Google Scholar
Ning Xu, An-An Liu, Yongkang Wong, Weizhi Nie, Yuting Su, and Mohan S. Kankanhalli. 2021 a. Scene Graph Inference via Multi-Scale Context Modeling. IEEE Trans. Circuits Syst. Video Technol., Vol. 31, 3 (2021), 1031--1041.Google ScholarCross Ref
Tong Xu, Peilun Zhou, Linkang Hu, Xiangnan He, Yao Hu, and Enhong Chen. 2021 b. Socializing the Videos: A Multimodal Approach for Social Relation Recognition. In ACM Transactions on Multimedia Computing, Communications, and Applications. Google ScholarDigital Library
Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In ECCV, 2018. 690--706.Google Scholar
Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing With Global Context. In CVPR, 2018. 5831--5840.Google Scholar
Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, and Lubomir D. Bourdev. 2015b. Beyond frontal faces: Improving Person Recognition using multiple cues. In CVPR, 2015. 4804--4813.Google Scholar
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015a. Learning Social Relation Traits from Face Images. In ICCV, 2015. 3631--3639. Google ScholarDigital Library

Index Terms

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Characters as graphs: Interpretable handwritten Chinese character recognition via Pyramid Graph Transformer
Highlights
- A novel skeleton graph is proposed to represent handwritten characters.
- A novel ...
Abstract
It is meaningful but challenging to teach machines to recognize handwritten Chinese characters. However, conventional approaches typically view handwritten Chinese characters as either static images or temporal trajectories, which may ...
Read More
Overall-Distinctive GCN for Social Relation Recognition on Videos
MultiMedia Modeling
Abstract
Recognizing social relationships between multiple characters from videos can enable intelligent systems to serve human society better. Previous studies mainly focus on the still image to classify the relationships while ignoring the important data ...
Read More
A Trust-Based Privacy-Preserving Friend Recommendation Scheme for Online Social Networks
Online social networks (OSNs), which attract thousands of million people to use everyday, greatly extend OSN users' social circles by friend recommendations. OSN users' existing social relationship can be characterized as 1-hop trust relationship, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph convolutional network
multimodal analysis
social relationship
video understanding
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 241
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Characters as graphs: Interpretable handwritten Chinese character recognition via Pyramid Graph Transformer

Overall-Distinctive GCN for Social Relation Recognition on Videos

A Trust-Based Privacy-Preserving Friend Recommendation Scheme for Online Social Networks