Skip to main content
Log in

Modeling the synchrony between interacting people: application to role recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The study of social interactions has attracted increasing attentions. The role recognition is one of its possible applications and the core of this study. This article proposes some approaches to automatically recognize the role of the participants of a meeting by modeling the synchrony of temporal nonverbal audio features. In our approache the Influence Model (IM), a Hidden Markov Model (HMM)-like, is used to model this synchrony and to extract from input data a feature vector that contains both information about temporal transitions (intra-personal data) and interaction between participants (inter-personal data). This modeling of the meeting is used as input of a Random Forests (RFs) for the role recognition task. The experiments are performed on 138 meetings (approximately 45 hours of recordings) from Augmented Multiparty Interaction (AMI) Corpus. Accuracy scores show that this combination of generative (IM) and discriminative (RFs) approaches permits to outperform state-of-the-art role recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Asavathiratham C (2000) The influence model: A tractable representation for the dynamics of networked Markov chains. MIT. PhD thesis

  2. Banerjee S, Cohen J, Quisel T, Chan A, Patodia Y, Al Bawab Z, Zhang R, Black A, Stern RM, Rudnicky AI et al (2004) Creating multi-modal, user-centric records of meetings with the carnegie mellon meeting recorder architecture. In: International Conference on Acoustic Speech and Signal Processing

  3. Banerjee S, Rudnicky AI (2004) Using simple speech–based features to detect the state of a meeting and the roles of the meeting participants. In: International Conference on Spoken Language Processing, pp 1–4

  4. Basu S, Choudhury T, Clarkson B, Pentland A et al (2001) Learning human interactions with the influence model. In: Conference on Neural Information Processing Systems

  5. Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2007) Generative or discriminative?getting the best of both worlds. Bayesian Statistics 8:3–24

    MathSciNet  Google Scholar 

  6. Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Computer Vision and Pattern Recognition, pp 994–999

  7. Cristani M, Pesarin A, Drioli C, Tavano A, Perina A, Murino V (2011) Generative modeling and classification of dialogs by a low-level turn-taking feature. Pattern Recogn 44(8):1785–1800

    Article  Google Scholar 

  8. Delaherche E, Chetouani M, Mahdhaoui A, Saint-Georges C, Viaux S, Cohen D (2012) Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans Affect Comput 3(3):349–365

    Article  Google Scholar 

  9. Dong W, Lepri B, Cappelletti A, Pentland AS, pianesi F, Zancanaro M (2007) Using the influence model to recognize functional roles in meetings. In: International Conference on Multimedia Interaction, pp 271–278

  10. Dong W, Lepri B, Pianesi F, Pentland A (2013) Modeling functional roles dynamics in small group interactions. IEEE Transactions on Multimedia 15(1):83–95

    Article  Google Scholar 

  11. Garg NP, Favre S, Salamin H, Hakkani tür D, Vinciarelli A (2008) Role recognition for meeting participants: an approach based on lexical information and social network analysis. In: MM, pp 693–696

  12. Holub A, Perona P (2005) A discriminative framework for modelling object classes. In: Computer Vision and Pattern Recognition, pp 664–671

  13. Jayagopi DB, Ba S, Odobez J-M, Gatica-Perez D (2008) Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues. In: International Conference on Multimedia Interaction, pp 45–52

  14. Laskowski K, ostendorf M, Schultz T (2008) Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Workshop of Special Interest Group on Discourse and Dialogue, pp 148–155

  15. Lassere J, Bishop C (2007) Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8:3–24

    MathSciNet  MATH  Google Scholar 

  16. Liu Y (2006) Initial study on automatic identification of speaker role in broadcast news speech. In: Conference of the North American Chapter of the Association for Computational Linguistics, Human Language Technology, pp 81–84

  17. Mccowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V et al (2005) The AMI meeting corpus. In: Measuring Behavior, vol 88

  18. McDowell LK, Gupta KM, Aha DW (2009) Cautious collective classification. J Mach Learn Res 10:2777–2836

    MathSciNet  MATH  Google Scholar 

  19. Ng A, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Conference on Neural Information Processing Systems, vol 14, p 841

  20. Pianesi F, Zancanaro M, Lepri B, Cappelletti A (2007) A multimodal annotated corpus of consensus decision making meetings. Lang Resour Eval 41(3-4):409–429

    Article  Google Scholar 

  21. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  22. Rosales R, Sclaroff S (2006) Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 67(3):251–276

    Article  Google Scholar 

  23. Salamin H, Favre S, Vinciarelli A (2009) Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 11(7):1373–1380

    Article  Google Scholar 

  24. Salzmann M, Urtasun R (2010) Combining discriminative and generative methods for 3d deformable surface and articulated pose reconstruction. In: Computer Vision and Pattern Recognition, pp 647–654

  25. Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) “An audio visual corpus for emergent leader analysis. In: Multimodal Corpora

  26. Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Transactions on Multimedia 14(3):816–832

    Article  Google Scholar 

  27. Thorndike E (1920) Intelligence and its use. Harper’s Magazine 140:227–235

    Google Scholar 

  28. Varni G, Volpe G, Camurri A (2010) A system for real-time multimodal analysis of nonverbal affective social interaction in user-centric media. IEEE Transactions on Multimedia 12(6):576–590

    Article  Google Scholar 

  29. Vinciarelli A (2007) Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6):1215–1226

    Article  Google Scholar 

  30. Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: Survey of an emerging domain. Image Vis Comput 27(12):1743–1759

    Article  Google Scholar 

  31. Wasserman S (1994) Social network analysis: Methods and applications, vol 8. Cambridge University Press

  32. Zancanaro M, lepri B, Pianesi F (2006) Automatic detection of group functional roles in face to face interactions. In: International Conference on Multimedia Interaction, pp 28–34

Download references

Acknowledgments

This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Fang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, S., Achard, C. & Dubuisson, S. Modeling the synchrony between interacting people: application to role recognition. Multimed Tools Appl 77, 503–518 (2018). https://doi.org/10.1007/s11042-016-4267-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4267-4

Keywords

Navigation