Modeling the synchrony between interacting people: application to role recognition

Fang, Sheng; Achard, Catherine; Dubuisson, Séverine

doi:10.1007/s11042-016-4267-4

Modeling the synchrony between interacting people: application to role recognition

Published: 29 December 2016

Volume 77, pages 503–518, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

292 Accesses
Explore all metrics

Abstract

The study of social interactions has attracted increasing attentions. The role recognition is one of its possible applications and the core of this study. This article proposes some approaches to automatically recognize the role of the participants of a meeting by modeling the synchrony of temporal nonverbal audio features. In our approache the Influence Model (IM), a Hidden Markov Model (HMM)-like, is used to model this synchrony and to extract from input data a feature vector that contains both information about temporal transitions (intra-personal data) and interaction between participants (inter-personal data). This modeling of the meeting is used as input of a Random Forests (RFs) for the role recognition task. The experiments are performed on 138 meetings (approximately 45 hours of recordings) from Augmented Multiparty Interaction (AMI) Corpus. Accuracy scores show that this combination of generative (IM) and discriminative (RFs) approaches permits to outperform state-of-the-art role recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Learning multimodal behavioral models for face-to-face social interaction

Article 24 July 2015

Turns Analysis for Automatic Role Recognition

References

Asavathiratham C (2000) The influence model: A tractable representation for the dynamics of networked Markov chains. MIT. PhD thesis
Banerjee S, Cohen J, Quisel T, Chan A, Patodia Y, Al Bawab Z, Zhang R, Black A, Stern RM, Rudnicky AI et al (2004) Creating multi-modal, user-centric records of meetings with the carnegie mellon meeting recorder architecture. In: International Conference on Acoustic Speech and Signal Processing
Banerjee S, Rudnicky AI (2004) Using simple speech–based features to detect the state of a meeting and the roles of the meeting participants. In: International Conference on Spoken Language Processing, pp 1–4
Basu S, Choudhury T, Clarkson B, Pentland A et al (2001) Learning human interactions with the influence model. In: Conference on Neural Information Processing Systems
Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2007) Generative or discriminative?getting the best of both worlds. Bayesian Statistics 8:3–24
MathSciNet Google Scholar
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Computer Vision and Pattern Recognition, pp 994–999
Cristani M, Pesarin A, Drioli C, Tavano A, Perina A, Murino V (2011) Generative modeling and classification of dialogs by a low-level turn-taking feature. Pattern Recogn 44(8):1785–1800
Article Google Scholar
Delaherche E, Chetouani M, Mahdhaoui A, Saint-Georges C, Viaux S, Cohen D (2012) Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans Affect Comput 3(3):349–365
Article Google Scholar
Dong W, Lepri B, Cappelletti A, Pentland AS, pianesi F, Zancanaro M (2007) Using the influence model to recognize functional roles in meetings. In: International Conference on Multimedia Interaction, pp 271–278
Dong W, Lepri B, Pianesi F, Pentland A (2013) Modeling functional roles dynamics in small group interactions. IEEE Transactions on Multimedia 15(1):83–95
Article Google Scholar
Garg NP, Favre S, Salamin H, Hakkani tür D, Vinciarelli A (2008) Role recognition for meeting participants: an approach based on lexical information and social network analysis. In: MM, pp 693–696
Holub A, Perona P (2005) A discriminative framework for modelling object classes. In: Computer Vision and Pattern Recognition, pp 664–671
Jayagopi DB, Ba S, Odobez J-M, Gatica-Perez D (2008) Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues. In: International Conference on Multimedia Interaction, pp 45–52
Laskowski K, ostendorf M, Schultz T (2008) Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Workshop of Special Interest Group on Discourse and Dialogue, pp 148–155
Lassere J, Bishop C (2007) Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8:3–24
MathSciNet MATH Google Scholar
Liu Y (2006) Initial study on automatic identification of speaker role in broadcast news speech. In: Conference of the North American Chapter of the Association for Computational Linguistics, Human Language Technology, pp 81–84
Mccowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V et al (2005) The AMI meeting corpus. In: Measuring Behavior, vol 88
McDowell LK, Gupta KM, Aha DW (2009) Cautious collective classification. J Mach Learn Res 10:2777–2836
MathSciNet MATH Google Scholar
Ng A, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Conference on Neural Information Processing Systems, vol 14, p 841
Pianesi F, Zancanaro M, Lepri B, Cappelletti A (2007) A multimodal annotated corpus of consensus decision making meetings. Lang Resour Eval 41(3-4):409–429
Article Google Scholar
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rosales R, Sclaroff S (2006) Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 67(3):251–276
Article Google Scholar
Salamin H, Favre S, Vinciarelli A (2009) Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 11(7):1373–1380
Article Google Scholar
Salzmann M, Urtasun R (2010) Combining discriminative and generative methods for 3d deformable surface and articulated pose reconstruction. In: Computer Vision and Pattern Recognition, pp 647–654
Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) “An audio visual corpus for emergent leader analysis. In: Multimodal Corpora
Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Transactions on Multimedia 14(3):816–832
Article Google Scholar
Thorndike E (1920) Intelligence and its use. Harper’s Magazine 140:227–235
Google Scholar
Varni G, Volpe G, Camurri A (2010) A system for real-time multimodal analysis of nonverbal affective social interaction in user-centric media. IEEE Transactions on Multimedia 12(6):576–590
Article Google Scholar
Vinciarelli A (2007) Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6):1215–1226
Article Google Scholar
Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: Survey of an emerging domain. Image Vis Comput 27(12):1743–1759
Article Google Scholar
Wasserman S (1994) Social network analysis: Methods and applications, vol 8. Cambridge University Press
Zancanaro M, lepri B, Pianesi F (2006) Automatic detection of group functional roles in face to face interactions. In: International Conference on Multimedia Interaction, pp 28–34

Download references

Acknowledgments

This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

Author information

Authors and Affiliations

Sorbonne Universités, UPMC Univ Paris 06 CNRS, UMR 7222, F-75005, Paris, France
Sheng Fang, Catherine Achard & Séverine Dubuisson

Authors

Sheng Fang
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Achard
View author publications
You can also search for this author in PubMed Google Scholar
Séverine Dubuisson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Fang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, S., Achard, C. & Dubuisson, S. Modeling the synchrony between interacting people: application to role recognition. Multimed Tools Appl 77, 503–518 (2018). https://doi.org/10.1007/s11042-016-4267-4

Download citation

Received: 16 June 2016
Revised: 27 October 2016
Accepted: 13 December 2016
Published: 29 December 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11042-016-4267-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling the synchrony between interacting people: application to role recognition

Abstract

Access this article

Similar content being viewed by others

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Learning multimodal behavioral models for face-to-face social interaction

Turns Analysis for Automatic Role Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling the synchrony between interacting people: application to role recognition

Abstract

Access this article

Similar content being viewed by others

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Learning multimodal behavioral models for face-to-face social interaction

Turns Analysis for Automatic Role Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation