skip to main content
10.1145/3340555.3353761acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Improved Visual Focus of Attention Estimation and Prosodic Features for Analyzing Group Interactions

Published:14 October 2019Publication History

ABSTRACT

Collaborative group tasks require efficient and productive verbal and non-verbal interactions among the participants. Studying such interaction patterns could help groups perform more efficiently, but the detection and measurement of human behavior is challenging since it is inherently multimodal and changes on a millisecond time frame. In this paper, we present a method to study groups performing a collaborative decision-making task using non-verbal behavioral cues. First, we present a novel algorithm to estimate the visual focus of attention (VFOA) of participants using frontal cameras. The algorithm can be used in various group settings, and performs with a state-of-the-art accuracy of 90%. Secondly, we present prosodic features for non-verbal speech analysis. These features are commonly used in speech/music classification tasks, but are rarely used in human group interaction analysis. We validate our algorithms on a multimodal dataset of 14 group meetings with 45 participants, and show that a combination of VFOA-based visual metrics and prosodic-feature-based metrics can predict emergent group leaders with 64% accuracy and dominant contributors with 86% accuracy. We also report our findings on the correlations between the non-verbal behavioral metrics with gender, emotional intelligence, and the Big 5 personality traits.

References

  1. F. Alías, J.C. Socoró, and X. Sevillano. 2016. A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds. Appl. Sci. 6, 143 (2016).Google ScholarGoogle Scholar
  2. S.O. Ba and J. Odobez. 2009. Recognizing visual focus of attention from head pose in natural meetings. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1(2009), 16–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S.O. Ba and J. Odobez. 2011. Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1(2011), 101–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Baltrusaitis, A. Zadeh, Y. Lim, and L. Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 59–66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Baron-Cohen, S. Wheelwright, J. Hill, Y. Raste, and I. Plumb. 2001. The “Reading the Mind in the Eyes” Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. The Journal of Child Psychology and Psychiatry and Allied Disciplines 42, 2(2001), 241–251.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Barry and G.L. Stewart. 1997. Composition, process, and performance in self-managed groups: The role of personality. Journal of Applied Psychology 82, 1 (1997), 62–78.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Beyan, F. Capozzi, C. Becchio, and V. Murino. 2017. Multi-task learning of social psychology assessments and nonverbal features for automatic leadership identification. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 451–455.Google ScholarGoogle Scholar
  8. C. Beyan, F. Capozzi, C. Becchio, and V. Murino. 2018. Prediction of the Leadership Style of an Emergent Leader Using Audio and Visual Nonverbal Features. IEEE Transactions on Multimedia 20, 2 (2018), 441–456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Beyan, N. Carissimi, F. Capozzi, S. Vascon, M. Bustreo, A. Pierro, C. Becchio, and V. Murino. 2016. Detecting emergent leader in a meeting environment using nonverbal visual features only. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 317–324.Google ScholarGoogle Scholar
  10. I. Bhattacharya, M. Foley, C. Ku, N. Zhang, T. Zhang, C. Mine, M. Li, H. Ji, C. Riedl, B. Foucault Welles, and R.J. Radke. 2019. The Unobtrusive Group Interaction (UGI) Corpus. In Proceedings of the 10th ACM Multimedia Systems Conference(MMSys ’19).Google ScholarGoogle Scholar
  11. I. Bhattacharya, M. Foley, N. Zhang, T. Zhang, C. Ku, C. Mine, H. Ji, C. Riedl, B. Foucault Welles, and R.J. Radke. 2018. A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis. In Proceedings of the 2018 International Conference on Multimodal Interaction. ACM, 347–355.Google ScholarGoogle Scholar
  12. J.H. Bradley and F.J. Hebert. 1997. The effect of personality type on team performance. Journal of Management Development 16, 5 (1997), 337–353.Google ScholarGoogle ScholarCross RefCross Ref
  13. J.S. Bridle and M.D. Brown. 1974. An Experimental Automatic Word-Recognition System. JSU Report 1003. Joint Speech Research Unit, Ruislip, England.Google ScholarGoogle Scholar
  14. A. Bulling and H. Gellersen. 2010. Toward mobile eye-based human-computer interaction. IEEE Pervasive Computing 9, 4 (2010), 8–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Burger, V. MacLaren, and H. Yu. 2002. The ISL meeting corpus: The impact of meeting type on speech style. In INTERSPEECH. Denver, CO.Google ScholarGoogle Scholar
  16. N. Campbell, T. Sadanobu, M. Imura, N. Iwahashi, S. Noriko, and D. Douxchamps. 2006. A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In Proc. Int. Conf. Lang. Resources Evaluation. Genoa, Italy.Google ScholarGoogle Scholar
  17. J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, 2005. The AMI meeting corpus: A pre-announcement. In International Workshop on Machine Learning for Multimodal Interaction. Springer, 28–39.Google ScholarGoogle Scholar
  18. J.W. Chang, T. Sy, and J.N. Change. 2012. Team Emotional Intelligence and Performance: Interactive Dynamics between Leaders and Members. Small Group Research 43, 1 (2012).Google ScholarGoogle Scholar
  19. N. Chinchor. 1992. MUC-4 evaluation metrics. In Proceedings of the 4th Conference on Message Understanding. Association for Computational Linguistics, 22–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Chrusciel. 2006. Considerations of emotional intelligence (EI) in dealing with change decision management. Management Decision 44, 5 (2006), 644–657.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.Google ScholarGoogle ScholarCross RefCross Ref
  22. P.L. Curşeu, R. Ilies, D. Virgǎ, L. Marticuţoiu, and F.A. Sava. 2018. Personality characteristics that are valued in teams: Not always “more is better”?International Journal of Psychology(2018).Google ScholarGoogle Scholar
  23. A. Darioly and M.S. Mast. 2014. The role of nonverbal behavior for leadership: An integrative review. In Leader Interpersonal and Influence Skills: The Soft Skills of Leadership, R.E. Riggio and S. Tan (Eds.). Taylor and Francis, 73–100.Google ScholarGoogle Scholar
  24. G. De Souza and H.J. Klein. 1995. Emergent leadership in the group goal-setting process. Small Group Research 26, 4 (1995), 475–496.Google ScholarGoogle ScholarCross RefCross Ref
  25. B.M. DePaulo and H.S. Friedman. 1998. Nonverbal communication. In Handbook of Social Psychology(4 ed.), D. Gilbert, S. Fisker, and G. Lindzey (Eds.). McGraw Hill, Boston, MA, 3–40.Google ScholarGoogle Scholar
  26. V. Druskat and A.T. Pescosolido. 2006. The impact of emergent leader’s emotionally competent behavior on team trust, communication, engagement, and effectiveness. Research on Emotion in Organizations 2 (2006), 25–55.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Duffner and C. Garcia. 2016. Visual focus of attention estimation with unsupervised incremental learning. IEEE Transactions on Circuits and Systems for Video Technology 26, 12(2016), 2264–2272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Frese, S. Beimel, and S. Schoenborn. 2003. Action training for charismatic leadership: Two evaluations of studies of a commercial training module on inspirational communication of a vision. Personnel Psychology 56, 3 (2003), 671–698.Google ScholarGoogle ScholarCross RefCross Ref
  29. H. Ghaemmaghami, B. Baker, R. Vogt, and S. Sridharan. 2010. Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. In 11th Annual Conference of the International Speech (InterSpeech). Makuhari, Japan, 3118–3121.Google ScholarGoogle Scholar
  30. S. Gonzalez and M. Brookes. 2014. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 2 (Feb. 2014), 518–530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Gorse, I. McKinney, A. Shepherd, and P. Whitehead. 2006. Meetings: Factors that affect group interaction and performance. Proceedings of the Association of Researchers in Construction Management (2006), 4–6.Google ScholarGoogle Scholar
  32. J. Gray and J. Markel. 1974. A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis. IEEE Trans. Acoust., Speech, Signal Process. 22, 3 (1974), 207–217.Google ScholarGoogle ScholarCross RefCross Ref
  33. J.A. Hall, E.J. Coats, and L.S. LeBeau. 2005. Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin 131, 6 (2005), 898–924.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Hall and W.H. Watson. 1970. The effects of a normative intervention on group decision-making performance. Human Relations 23, 4 (1970), 299–317.Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Harris Bond and I. Wing-Chun Ng. 2004. The depth of a group’s personality resources: Impacts on group process and group performance. Asian Journal of Social Psychology 7, 3 (2004), 285–300.Google ScholarGoogle ScholarCross RefCross Ref
  36. C. Harte, M. Sandler, and M. Gasser. 2006. Detecting Harmonic Change in Musical Audio. In 1st ACM Workshop on Audio and Music Computing Multimedia. ACM, Santa Barbara, CA, 21–26.Google ScholarGoogle Scholar
  37. J.A Hesch and S.I Roumeliotis. 2011. A direct least-squares (DLS) method for PnP. In 2011 International Conference on Computer Vision. IEEE, 383–390.Google ScholarGoogle Scholar
  38. V. Iglovikov and A. Shvets. 2018. Ternausnet: U-net with VGG11 encoder pre-trained on Imagenet for image segmentation. arXiv preprint arXiv:1801.05746(2018).Google ScholarGoogle Scholar
  39. A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, 2003. The ICSI meeting corpus. In Int. Conf. Acoust., Speech, and Signal Process.Google ScholarGoogle Scholar
  40. D. Jayagopi, D. Sanchez-Cortes, K. Otsuka, J. Yamato, and D. Gatica-Perez. 2012. Linking speaking and looking behavior patterns with group composition, perception, and performance. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM, 433–440.Google ScholarGoogle Scholar
  41. D. Jiang, L. Lu, H. Zhang, J. Tao, and L. Cai. 2002. Music type classification by spectral contrast feature. In International Conference on Multimedia and Expo. 113–116.Google ScholarGoogle Scholar
  42. O.P. John and S. Srivastava. 1999. The Big Five trait taxonomy: History, measurement, and theoretical perspectives. In Handbook Personality: Theory and Research (2 ed.), L.A. Pervin and O.P. John (Eds.). McGraw Hill, Boston, MA, 102–138.Google ScholarGoogle Scholar
  43. S.L. Kichuk and W.H. Wiesner. 1997. The big five personality factors and team performance: implications for selecting successful product design teams. Journal of Engineering and Technology Management 14, 3-4(1997), 195–221.Google ScholarGoogle ScholarCross RefCross Ref
  44. S. Liang and X. Fan. 2014. Audio Content Classification Method Research Based on Two-step Strategy. Int. J. Adv. Comput. Sci. Appl. 5 (2014), 57–62.Google ScholarGoogle Scholar
  45. J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.Google ScholarGoogle Scholar
  46. R.G Lord, R.J Foti, and C.L De Vader. 1984. A test of leadership categorization theory: Internal structure, information processing, and leadership. Organizational Behavior and Human Performance 34, 3(1984), 343–378.Google ScholarGoogle ScholarCross RefCross Ref
  47. B. Massé, S. Ba, and R. Horaud. 2018. Tracking gaze and visual focus of attention of people involved in social interaction. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11(2018), 2711–2724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Mathur, M.S. Poole, F. Pena-Mora, M. Hasegawa-Johnson, and N. Contractor. 2012. Detecting interaction links in a collaborating group using manually annotated data. Social Networks 34, 4 (2012), 515–526.Google ScholarGoogle ScholarCross RefCross Ref
  49. L. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang. 2005. Automatic analysis of multimodal group actions in meeting. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3 (March 2005), 305–317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. K. Otsuka, K. Kasuga, and M. Köhler. 2018. Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 191–199.Google ScholarGoogle Scholar
  51. K. Otsuka, H. Sawada, and J. Yamato. 2007. Automatic inference of cross-modal nonverbal interactions in multiparty conversations: Who responds to whom, when, and how? From gaze, head gestures, and utterances. In Proc. Int. Conf. Multimodal Interfaces. ACM, Aichi, Japan.Google ScholarGoogle Scholar
  52. K. Otsuka, Y. Takemae, and J. Yamato. 2005. A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proceedings of the 7th International Conference on Multimodal Interfaces. ACM, 191–198.Google ScholarGoogle Scholar
  53. K. Otsuka, J. Yamato, Y. Takemae, and H. Murase. 2006. Conversation scene analysis with dynamic Bayesian Network based on visual head tracking. In Proc. Int. Conf. Multimedia and Expo.IEEE, Toronto, ON, Canada.Google ScholarGoogle Scholar
  54. A.T. Pescosolido. 2001. Informal leaders and the development of group efficacy. Small Group Research 32, 1 (2001), 74–93.Google ScholarGoogle ScholarCross RefCross Ref
  55. B. Rammstedt and O.P. John. 2007. Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German. Journal of Research in Personality 41, 1 (2007), 203–212.Google ScholarGoogle ScholarCross RefCross Ref
  56. M. Remland. 1981. Developing leadership skills in nonverbal communication: A situational perspective. Journal of Business Communication 18, 3 (1981), 17–29.Google ScholarGoogle ScholarCross RefCross Ref
  57. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.Google ScholarGoogle Scholar
  58. D.E. Rumelhart, G.E. Hinton, and R.J. Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.Google ScholarGoogle Scholar
  59. D. Sanchez-Cortes, O. Aran, and D. Gatica-Perez. 2011. An audio visual corpus for emergent leader analysis. In Workshop Multimodal Corpora Mach. Learning: Taking Stock and Road Mapping the Future. Alicante, Spain.Google ScholarGoogle Scholar
  60. D. Sanchez-Cortes, O. Aran, and M. Schmid Mast D. Gatica-Perez. 2012. A Nonverbal Behavior Approach to Identify Emergent Leaders in Small Groups. IEEE Transactions on Multimedia 14, 3 (2012), 816–832.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Saunders. 1996. Real-time discrimination of broadcast speech/music. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 993–996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. E. Scheirer and M. Slaney. 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1331–1334.Google ScholarGoogle Scholar
  63. L. Sukhostat and Y. Imamverdiyev. 2015. A Comparative Analysis of Pitch Detection Methods Under the Influence of Different Noise Conditions. Journal of Voice 29, 4 (July 2015), 410–417.Google ScholarGoogle ScholarCross RefCross Ref
  64. C. Thoman. 2009. Model-Based Classification of Speech Audio. Master’s thesis. Florida Atlantic University, Florida, USA.Google ScholarGoogle Scholar
  65. A.L.C. Wang. 2003. An industrial-strength audio search algorithm. In Proceedings of the 4th International Society for Music Information Retrieval Conference. Baltimore, MD, 7–13.Google ScholarGoogle Scholar
  66. F. Wang, X. Wang, B. Shao, T. Li, and M. Ogihara. 2009. Tag Integrated Multi-Label Music Style Classification with Hypergraph. In Proceedings of the 10th International Society for Music Information Retrieval Conference. Kobe, Japan, 363–368.Google ScholarGoogle Scholar
  67. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.Google ScholarGoogle ScholarCross RefCross Ref
  68. T. Zhang and J.C.C. Kuo. 1999. Heuristic approach for generic audio data segmentation and annotation. In Proceedings of the 7th ACM International Conference on Multimedia. 67–76.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICMI '19: 2019 International Conference on Multimodal Interaction
    October 2019
    601 pages
    ISBN:9781450368605
    DOI:10.1145/3340555

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 October 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate453of1,080submissions,42%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format