research-article

Investigation of Small Group Social Interactions Using Deep Visual Activity-Based Nonverbal Features

Authors:
Cigdem Beyan

Istituto Italiano di Tecnologia, Genoa, Italy

Istituto Italiano di Tecnologia, Genoa, Italy
View Profile

,
Muhammad Shahid

Istituto Italiano di Tecnologia & University of Genoa, Genoa, Italy

Istituto Italiano di Tecnologia & University of Genoa, Genoa, Italy
View Profile

,
Vittorio Murino

Istituto Italiano di Tecnologia & University of Verona, Genoa & Verona, Italy

Istituto Italiano di Tecnologia & University of Verona, Genoa & Verona, Italy
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 311–319https://doi.org/10.1145/3240508.3240685

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 311–319

ABSTRACT

Understanding small group face-to-face interactions is a prominent research problem for social psychology while the automatic realization of it recently became popular in social computing. This is mainly investigated in terms of nonverbal behaviors, as they are one of the main facet of communication. Among several multi-modal nonverbal cues, visual activity is an important one and its sufficiently good performance can be crucial for instance, when the audio sensors are missing. The existing visual activity-based nonverbal features, which are all hand-crafted, were able to perform well enough for some applications while did not perform well for some other problems. Given these observations, we claim that there is a need of more robust feature representations, which can be learned from data itself. To realize this, we propose a novel method, which is composed of optical flow computation, deep neural network based feature learning, feature encoding and classification. Additionally, a comprehensive analysis between different feature encoding techniques is also presented. The proposed method is tested on three research topics, which can be perceived during small group interactions i.e. meetings: i) emergent leader detection, ii) emergent leadership style prediction, and iii) high/low extraversion classification. The proposed method shows (significantly) better results not only as compared to the state of the art visual activity based-nonverbal features but also when the state of the art visual activity based-nonverbal features are combined with other audio-based and video-based nonverbal features.

References

Marc Al-Hames, Alfred Dielmann, Daniel Gatica-Perez, Stephan Reiter, Steve Renals, Gerhard Rigoll, and Dong Zhang. 2006. Multimodal integration for meeting group action segmentation and recognition. , Vol. 3869 (2006), 52--63. Google ScholarDigital Library
Oya Aran and Daniel Gatica-Perez. 2013. One of a kind: inferring personality impressions in meetings. In ACM ICMI. 9--13. Google ScholarDigital Library
Sarah Adel Bargal, Emad Barsoum, Cristian Canton Ferrer, and Cha Zhang. 2016. Emotion Recognition in the Wild from Videos using Images. In ACM ICMI. 433--436. Google ScholarDigital Library
Bernard M. Bass and Ronald E. Riggio. 2006. Transformational Leadership .Psychology Press.Google Scholar
Cigdem Beyan, Francesca Capozzi, Cristina Becchio, and Vittorio Murino. 2016. Identification of Emergent Leaders in a Meeting Scenario Using Multiple Kernel Learning. ACM ICMI-ASSP4MI, 3--10. Google ScholarDigital Library
Cigdem Beyan, Francesca Capozzi, Cristina Becchio, and Vittorio Murino. 2018. Prediction of the Leadership Style of an Emergent Leader Using Audio and Visual Nonverbal Features. , Vol. 20, 2 (2018), 441--456. Google ScholarDigital Library
Cigdem Beyan, Nicolo Carissimi, Francesca Capozzi, Sebastiano Vascon, Matteo Bustreo, Antonio Pierro, Cristina Becchio, and Vittorio Murino. 2016. Detecting Emergent Leader in a Meeting Environment Using Nonverbal Visual Features Only. ACM ICMI, 317--324. Google ScholarDigital Library
Cigdem Beyan and Robert Bob Fisher. 2013. Detection of Abnormal Fish Trajectories Using a Clustering Based Hierarchical Classifier. In BMVC .Google Scholar
Cigdem Beyan, Vasiliki-Maria Katsageorgiou, and Vittorio Murino. 2017. Moving as a Leader: Detecting Emergent Leadership in Small Groups using Body Pose. ACM Multimedia, 1425--1433. Google ScholarDigital Library
Joan-Isaac Biel and Daniel Gatica-Perez. 2012. The youtube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Trans Multimedia , Vol. 15 (2012), 41--55. Google ScholarDigital Library
Joan-Isaac Biel, Lucia Teijeiro-Mosquera, and D. Gatica-Perez. 2012. Facetube: predicting personality from facial expressions of emotion in online conversational video. In ACM ICMI-MLMI . Google ScholarDigital Library
Thomas Brox, Andres Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In ECCV . 25--36.Google Scholar
Dong Seon Cheng, Hugues Salamin, Pietro Salvagnini, Marco Cristani, Alessandro Vinciarelli, and Vittorio Murino. 2014. Predicting online lecture ratings based on gesturing and vocal behavior. (2014), 1--11.Google Scholar
Gokul Chittaranjan, Jan Blom, and Daniel Gatica-Perez. 2013. Mining large-scale smartphone data for personality studies. Personal and Ubiquitous Computing , Vol. 17, 3 (2013), 433--450. Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. Proceedings of IEEE CVPR, 248--255.Google ScholarCross Ref
Alfred Dielmann and Steve Renals. 2007. Automatic meeting segmentation using dynamic bayesian networks. IEEE Trans. Multimedia , Vol. 9, 1 (2007), 25--36. Google ScholarDigital Library
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR .Google Scholar
Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 39, 4 (2017), 677--691. Google ScholarDigital Library
Wen Dong, Bruno Lepri, Alessandro Cappelletti, Alex Sandy Pentland, Fabio Pianesi, and Massimo Zancanaro. 2007. Using the influence model to recognize functional roles in meetings. In ACM ICMI . 271--278. Google ScholarDigital Library
Sarah Favre, Alfred Dielmann, and Alessandro Vinciarelli. 2009. Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models. In ACM MM. 585--588. Google ScholarDigital Library
Sarah Favre, Hugues Salamin, John Dines, and Alessandro Vinciarelli. 2008. Role recognition in multiparty recordings using social affiliation networks and discrete distributions. In ACM ICMI. 29--36. Google ScholarDigital Library
Sebastian Feese, Bert Arnrich, Gerhard Tröster, Bertolt Meyer, and Klaus Jonas. 2011. Detecting Posture Mirroring in Social Interactions with Wearable Sensors. Int. Symp. on Wearable Computers, 119--120. Google ScholarDigital Library
Sebastian Feese, Bert Arnrich, Gerhard Tröster, Bertolt Meyer, and Klaus Jonas. 2012. Quantifying Behavioral Mimicry by Automatic Detection of Nonverbal Cues from Body Motion. IEEE SocialCom/PASSAT, 520--525. Google ScholarDigital Library
S. Feese, A. Muaremi, B. Arnrich, G. Tröster, B. Meyer, and K. Jonas. 2011. Discriminating Individually Considerate and Authoritarian Leaders by Speech Activity Cues. IEEE SocialCom/PASSAT, 1460--1465.Google Scholar
David Johnson Frank Pierce Johnson. 1991. Joining together: Group theory and group skills .Prentice-Hall, Inc.Google Scholar
Daniel Gatica-Perez, Iain McCowan, Dong Zhang, and Samy Bengio. 2005. Detecting group interest-level in meetings. In IEEE ICASSP. 489--492.Google Scholar
Daniel Gatica-Perez, Alessandro Vinciarelli, and Jean-Marc Odobez. 2014. Nonverbal Behavior Analysis, In Multimodal Interactive Syst. Manage .EPFL Press.Google Scholar
Mehmet Gonen and Ethem Alpaydin. 2008. Localized Multiple Kernel Learning. In ICML. 352--359. Google ScholarDigital Library
Mehmet Gonen and Ethem Alpaydin. 2011. Multiple kernel learning algorithms. Journal of Machine Learning Research , Vol. 12 (2011), 2211--2268. Google ScholarDigital Library
Kai Guo, Prakash Ishwar, and Janusz Konrad. 2010. Action recognition using sparse representation on covariance manifolds of optical flow. In AVSS . 188--195. Google ScholarDigital Library
Hayley Hung, Dinesh Babu Jayagopi, Sileye Ba, Jean-Marc Odobez, and Daniel Gatica-Perez. 2008. Investigating automatic dominance estimation in groups from visual attention and speaking activity. In ACM ICMI. 233--236. Google ScholarDigital Library
Mohamed E. Hussein, Marwan Torki, Mohammad A. Gowayyed, and Motaz El-Saban. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In IJCAI. 2466--2472. Google ScholarDigital Library
Dinesh Babu Jayagopi and Daniel Gatica-Perez. 2010. Mining group nonverbal conversational patterns using probabilistic topic models. IEEE Trans. Multimedia , Vol. 12, 8 (2010), 790--802. Google ScholarDigital Library
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 35, 1 (2013), 221--231. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. Proceedings of ACM MM, 675--678. Google ScholarDigital Library
Kyriaki Kalimeri, Bruno Lepri, Oya Aran, Dinesh Babu Jayagopi, Daniel Gatica-Perez, and Fabio Pianesi. 2012. Modeling dominance effects on nonverbal behaviors using granger causality. In ACM ICMI . 23--26. Google ScholarDigital Library
Kyriaki Kalimeri, Bruno Lepri, Taemie Kim, Fabio Pianesi, and Alex Sandy Pentland. 2011. Automatic Modeling of Dominance Effects Using Granger Causality. Human Behavior Understanding, Lecture Notes in Computer Science, A. Salah, Lepri, B., Eds., Springer, Berlin/Heidelberg , Vol. 7065 (2011), 124--133. Google ScholarDigital Library
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale Video Classification with Convolutional Neural Networks. In CVPR . 1725--1732. Google ScholarDigital Library
Ahmet Alp Kindiroglu, Lale Akarun, and Oya Aran. 2017. Multi-domain and multi-task prediction of extraversion and leadership from meeting videos. EURASIP Journal on Image and Video Processing , Vol. 77 (2017), 1--14.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Proceedings of NIPS, 1106--1114. Google ScholarDigital Library
Bruno Lepri, Kyriaki Kalimeri, and Fabio Pianesi. 2010. Honest Signals and Their Contribution to the Automatic Analysis of Personality Traits A Comparative Study. Human Behavior Understanding, ser. Lecture Notes in Computer Science, A. Salah, T. Gevers, N. Sebe, and A. Vinciarelli, Eds. Springer, Berlin / Heidelberg , Vol. 6219 (2010), 140--150. Google ScholarDigital Library
Bruno Lepri, Ramanathan Subramanian, Kyriaki Kalimeri, Jacopo Staiano, Fabio Pianesi, and Nicu Sebe. 2012. Connecting Meeting Behavior with Extraversion-A Systematic Study. IEEE Transactions on Affective Computing , Vol. 3 (2012), 443--455. Google ScholarDigital Library
Iain McCowan, Daniel Gatica-Perez, Samy Bengio, Guillaume Lathoud, Mark Barnard, and Dong Zhang. 2005. Automatic Analysis of Multimodal Group Actions in Meetings. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 27, 3 (2005), 305--317. Google ScholarDigital Library
Gelareh Mohammadi, Antonio Origlia, Maurizio Filippone, and Alessandro Vinciarelli. 2012. From speech to personality: mapping voice quality and intonation into personality difference. In ACM Multimedia. 789--792. Google ScholarDigital Library
Shogo Okada, Oya Aran, and Daniel Gatica-Perez. 2015. Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery. In ACM ICMI. 15--22. Google ScholarDigital Library
Kazuhiro Otsuka, Junji Yamato, Yoshinao Takemae, and Hiroshi Murase. 2006. Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In Proc. of the ACM CHI Extended Abstract . Google ScholarDigital Library
P. Ravindra De Silva and Nadia Bianchi Berthouze. 2004. Modeling human affective postures: an information theoretic characterization of posture features. Journal of Computational Animation and Virtual World , Vol. 15 (2004), 269?--276. Google ScholarDigital Library
Yanwei Pang, Yuan Yuan, and Xuelong Li. 2008. Gabor-based region covariance matrices for face recognition. , Vol. 18, 7 (2008), 989--993. Google ScholarDigital Library
Fabio Pianesi, Nadia Mana, Alessandro Cappelletti, Bruno Lepri, and Massimo Zancanaro. 2008. Multimodal recognition of personality traits in social interactions. In ACM ICMI . 53--60. Google ScholarDigital Library
Fatih Porikli, Oncel Tuzel, and Peter Meer. 2006. Covariance tracking using model update based on lie algebra. In CVPR. 728--735. Google ScholarDigital Library
Rutger Rienks, Dong Zhang, Daniel Gatica-Perez, and Wilfried Post. 2006. Detection and application of influence rankings in small group meetings. In ACM ICMI . 257--264. Google ScholarDigital Library
Dairazalia Sanchez-Cortes, Oya Aran, Dinesh Babu Jayagopi, Marianne Schmid Mast, and Daniel Gatica-Perez. 2012. Emergent Leaders through Looking and Speaking: from Audio-Visual Data to Multimodal Recognition. Journal on Multimodal User Interfaces , Vol. 7, 1--2 (2012), 39--53.Google Scholar
Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. 2010. Identifying emergent leadership in small groups using nonverbal communicative cues. ACM ICMI-MLMI, 8--10.Google Scholar
Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. 2012. A Nonverbal Behavior Approach to Identify Emergent Leaders in Small Groups. IEEE Trans. Multimedia , Vol. 14, 3 (2012), 816--832. Google ScholarCross Ref
Ashtosh Sapru and Herve Bourlard. 2013. Automatic Social Role Recognition In Professional Meetings Using Conditional Random Fields. Proceedings of Interspeech.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. Proceedings of NIPS. Google ScholarDigital Library
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Tech. Rep. CRCV-TR-12-01, Univ. Central Florida, Orlando, FL, USA (2012).Google Scholar
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2006. Region covariance: A fast descriptor for detection and classification. In ECCV . 589--600. Google ScholarDigital Library
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2007. Human detection via classification on riemannian manifolds. In CVPR .Google Scholar
Alessandro Vinciarelli, Fabio Valente, Sree Harsha Yella, and Ashtosh Sapru. 2011. Understanding Social Signals in Multi-party Conversations: Automatic Recognition of Socio-Emotional Roles in the AMI Meeting Corpus. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics .Google ScholarCross Ref
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. Proceedings of ACM Multimedia. Google ScholarDigital Library
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR , Vol. abs/1304.5634 (2013).Google Scholar
Chunfeng Yuan, Weiming Hu, Xi Li, Stephen Maybank, and Guan Luo. 2009. Human action recognition under log-euclidean riemannian metric. In ACCV . 343--353. Google ScholarDigital Library
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. Proceedings of ECCV, 818--833.Google Scholar

Index Terms

Investigation of Small Group Social Interactions Using Deep Visual Activity-Based Nonverbal Features
1. Applied computing
  1. Law, social and behavioral sciences
    1. Psychology

Recommendations

Predicting remote versus collocated group interactions using nonverbal cues
ICMI-MLMI '09: Proceedings of the ICMI-MLMI '09 Workshop on Multimodal Sensor-Based Systems and Mobile Phones for Social Computing

This paper addresses two problems: Firstly, the problem of classifying remote and collocated small-group working meetings, and secondly, the problem of identifying the remote participant, using in both cases nonverbal behavioral cues. Such classifiers ...
Read More
Discovering group nonverbal conversational patterns with topics
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

This paper addresses the problem of discovering conversational group dynamics from nonverbal cues extracted from thin-slices of interaction. We first propose and analyze a novel thin-slice interaction descriptor - a bag of group nonverbal patterns - ...
Read More
Prediction/Assessment of communication skill using multimodal cues in social interactions
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

Understanding people’s behavior in social interactions is a very interesting problem in Social Computing. In this work, we automatically predict the communication skill of a person in various kinds of social interactions. We consider in particular, 1) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep neural network
feature encoding
meetings
nonverbal behavior
small groups
social interactions
visual activity
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 348
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Investigation of Small Group Social Interactions Using Deep Visual Activity-Based Nonverbal Features

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting remote versus collocated group interactions using nonverbal cues

Discovering group nonverbal conversational patterns with topics

Prediction/Assessment of communication skill using multimodal cues in social interactions