Understanding the Dynamics of Social Interactions: A Multi-Modal Multi-View Approach

Published: 17 February 2019


In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints’ energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.


  • (2025)Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionIEEE Access10.1109/ACCESS.2024.352518513(6463-6474)Online publication date: 2025
  • (2024)Modeling social interaction dynamics using temporal graph networks2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)10.1109/RO-MAN60168.2024.10731450(2272-2278)Online publication date: 26-Aug-2024
  • (2024)Survey of Automated Methods for Nonverbal Behavior Analysis in Parent-Child Interactions2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10582009(1-11)Online publication date: 27-May-2024
    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    Volume 15, Issue 1s
    Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
    January 2019
    265 pages
    Association for Computing Machinery

    New York, NY, United States

    Published: 17 February 2019

    Published: 17 February 2019
    Accepted: 01 November 2018
    Revised: 01 October 2018
    Received: 01 October 2017
    Published in TOMM Volume 15, Issue 1s


    Author Tags

    1. CNN
    2. Interaction recognition
    3. RGB-D
    4. active/passive subjects
    5. multi-modal data
    6. skeleton


    Funding Sources

    • Agency for Science, Technology and Research (A*STAR), Singapore


