Abstract
The study of social interactions has attracted increasing attentions. The role recognition is one of its possible applications and the core of this study. This article proposes some approaches to automatically recognize the role of the participants of a meeting by modeling the synchrony of temporal nonverbal audio features. In our approache the Influence Model (IM), a Hidden Markov Model (HMM)-like, is used to model this synchrony and to extract from input data a feature vector that contains both information about temporal transitions (intra-personal data) and interaction between participants (inter-personal data). This modeling of the meeting is used as input of a Random Forests (RFs) for the role recognition task. The experiments are performed on 138 meetings (approximately 45 hours of recordings) from Augmented Multiparty Interaction (AMI) Corpus. Accuracy scores show that this combination of generative (IM) and discriminative (RFs) approaches permits to outperform state-of-the-art role recognition rates.
Similar content being viewed by others
References
Asavathiratham C (2000) The influence model: A tractable representation for the dynamics of networked Markov chains. MIT. PhD thesis
Banerjee S, Cohen J, Quisel T, Chan A, Patodia Y, Al Bawab Z, Zhang R, Black A, Stern RM, Rudnicky AI et al (2004) Creating multi-modal, user-centric records of meetings with the carnegie mellon meeting recorder architecture. In: International Conference on Acoustic Speech and Signal Processing
Banerjee S, Rudnicky AI (2004) Using simple speech–based features to detect the state of a meeting and the roles of the meeting participants. In: International Conference on Spoken Language Processing, pp 1–4
Basu S, Choudhury T, Clarkson B, Pentland A et al (2001) Learning human interactions with the influence model. In: Conference on Neural Information Processing Systems
Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2007) Generative or discriminative?getting the best of both worlds. Bayesian Statistics 8:3–24
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Computer Vision and Pattern Recognition, pp 994–999
Cristani M, Pesarin A, Drioli C, Tavano A, Perina A, Murino V (2011) Generative modeling and classification of dialogs by a low-level turn-taking feature. Pattern Recogn 44(8):1785–1800
Delaherche E, Chetouani M, Mahdhaoui A, Saint-Georges C, Viaux S, Cohen D (2012) Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans Affect Comput 3(3):349–365
Dong W, Lepri B, Cappelletti A, Pentland AS, pianesi F, Zancanaro M (2007) Using the influence model to recognize functional roles in meetings. In: International Conference on Multimedia Interaction, pp 271–278
Dong W, Lepri B, Pianesi F, Pentland A (2013) Modeling functional roles dynamics in small group interactions. IEEE Transactions on Multimedia 15(1):83–95
Garg NP, Favre S, Salamin H, Hakkani tür D, Vinciarelli A (2008) Role recognition for meeting participants: an approach based on lexical information and social network analysis. In: MM, pp 693–696
Holub A, Perona P (2005) A discriminative framework for modelling object classes. In: Computer Vision and Pattern Recognition, pp 664–671
Jayagopi DB, Ba S, Odobez J-M, Gatica-Perez D (2008) Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues. In: International Conference on Multimedia Interaction, pp 45–52
Laskowski K, ostendorf M, Schultz T (2008) Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Workshop of Special Interest Group on Discourse and Dialogue, pp 148–155
Lassere J, Bishop C (2007) Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8:3–24
Liu Y (2006) Initial study on automatic identification of speaker role in broadcast news speech. In: Conference of the North American Chapter of the Association for Computational Linguistics, Human Language Technology, pp 81–84
Mccowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V et al (2005) The AMI meeting corpus. In: Measuring Behavior, vol 88
McDowell LK, Gupta KM, Aha DW (2009) Cautious collective classification. J Mach Learn Res 10:2777–2836
Ng A, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Conference on Neural Information Processing Systems, vol 14, p 841
Pianesi F, Zancanaro M, Lepri B, Cappelletti A (2007) A multimodal annotated corpus of consensus decision making meetings. Lang Resour Eval 41(3-4):409–429
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Rosales R, Sclaroff S (2006) Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 67(3):251–276
Salamin H, Favre S, Vinciarelli A (2009) Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 11(7):1373–1380
Salzmann M, Urtasun R (2010) Combining discriminative and generative methods for 3d deformable surface and articulated pose reconstruction. In: Computer Vision and Pattern Recognition, pp 647–654
Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) “An audio visual corpus for emergent leader analysis. In: Multimodal Corpora
Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Transactions on Multimedia 14(3):816–832
Thorndike E (1920) Intelligence and its use. Harper’s Magazine 140:227–235
Varni G, Volpe G, Camurri A (2010) A system for real-time multimodal analysis of nonverbal affective social interaction in user-centric media. IEEE Transactions on Multimedia 12(6):576–590
Vinciarelli A (2007) Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6):1215–1226
Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: Survey of an emerging domain. Image Vis Comput 27(12):1743–1759
Wasserman S (1994) Social network analysis: Methods and applications, vol 8. Cambridge University Press
Zancanaro M, lepri B, Pianesi F (2006) Automatic detection of group functional roles in face to face interactions. In: International Conference on Multimedia Interaction, pp 28–34
Acknowledgments
This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, S., Achard, C. & Dubuisson, S. Modeling the synchrony between interacting people: application to role recognition. Multimed Tools Appl 77, 503–518 (2018). https://doi.org/10.1007/s11042-016-4267-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4267-4