Skip to main content

Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments

  • Chapter
Book cover Handbook of Ambient Intelligence and Smart Environments

Abstract

Among the many possibilities of automation enabled by multi-sensor environments - several of which are discussed in this Handbook - one particularly relevant is the analysis of social interaction in the workplace, and more specifically, of conversational group interaction. Group conversations are ubiquitous, and represent a fundamental means through which ideas are discussed, progress is reported, and knowledge is created and disseminated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argyle M, JGraham (1977) The central europe experiment - looking at persons and looking at things. Journal of Environmental Psychology and Nonverbal Behaviour 1:6–16

    Article  Google Scholar 

  2. Ba S, Odobez JM (2008) Multi-person visual focus of attention from head pose and meeting contextual cues. Tech. Rep. 47, Idiap Research Institute

    Google Scholar 

  3. Ba S, Odobez JM (2008) Recognizing human visual focus of attention from head pose in meetings. IEEE Trans. on System, Man and Cybernetics: part B, Man, Vol. 39. No. 1. pp. 16-34, Feb 2009

    Article  Google Scholar 

  4. Ba SO, Odobez JM (2005) A Rao-Blackwellized mixed state particle filter for head pose tracking. In: Proc. ACM-ICMI-MMMP, pp 9–16

    Google Scholar 

  5. Bachour K, Kaplan F, Dillenbourg P (Sept, 2008) An interactive table for regulating face-to-face collaborative learning. In: Proc. European Conf. on Technology-Enhanced Learning (ECTEL), Maastricht

    Google Scholar 

  6. Basu S, Choudhury T, Clarkson B, Pentland A (Dec. 2001) Towards measuring human interactions in conversational settings. In: Proc. IEEE CVPR Int. Workshop on Cues in Communication (CVPR-CUES), Kauai

    Google Scholar 

  7. Burgoon JK, Dunbar NE (2006) The Sage Handbook of Nonverbal Communication, Sage, chap Nonverbal expressions of dominance and power in human relationships

    Google Scholar 

  8. Cappella J (1985) Multichannel integrations of nonverbal behavior, Erlbaum, chap Controlling the floor in conversation

    Google Scholar 

  9. Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, T Hain JK, Karaiskos V, Kraaij W, Kronenthal M, Lathoud G, Lincoln M, A Lisowska IM, Post W, Reidsma D, Wellner P (2005) The AMI meeting corpus: A pre-announcement. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh

    Google Scholar 

  10. Chen L, Harper M, Franklin A, Rose T, Kimbara I (2005) A Multimodal Analysis of Floor Control in Meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)

    Google Scholar 

  11. Cook M, Smith JMC (1975) The role of gaze in impression formation. British Journal of Social and Clinical Psychology

    Google Scholar 

  12. DiMicco JM, Pandolfo A, Bender W (2004) Influencing group participation with a shared display. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), Chicago

    Google Scholar 

  13. Dines J, Vepa J, Hain T (2006) The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)

    Google Scholar 

  14. Dovidio JF, Ellyson SL (1982) Decoding visual dominance: atributions of power based on relative percentages of looking while speaking and looking while listening. Social Psychology Quarterly 45(2):106–113

    Article  Google Scholar 

  15. Dunbar NE, Burgoon JK (2005) Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships 22(2):207–233

    Article  Google Scholar 

  16. Duncan Jr S (1972) Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2):283–292

    Article  Google Scholar 

  17. Efran JS (1968) Looking for approval: effects of visual behavior of approbation from persons differing in importance. Journal of Personality and Social Psychology 10(1):21–25

    Article  Google Scholar 

  18. Exline RV, Ellyson SL, Long B (1975) Advances in the study of communication and affect, Plenum Press, chap Visual behavior as an aspect of power role relationships

    Google Scholar 

  19. Fay N, Garod S, Carletta J (2000) Group discussion as interactive dialogue or serial monologue: the influence of group size. Psychological Science 11(6):487–492

    Article  Google Scholar 

  20. Freedman EG, Sparks DL (1997) Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. Journal of Neurophysiology 77:2328–2348

    Google Scholar 

  21. Gatica-Perez D (2006) Analyzing human interaction in conversations: a review. In: Proc. IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI), Heidelberg

    Google Scholar 

  22. Gatica-Perez D (2009) Automatic Nonverbal Analysis of Social Interaction in Small Groups: a Review, Image and Vision Computing, Special Issue on Human Naturalistic Behavior

    Google Scholar 

  23. Gauvain J, Lee CH (1992) Bayesian learning for hidden Markov model with Gaussian mixture state observation densities. Speech Communication 11:205–213

    Article  Google Scholar 

  24. Goodwin C, Heritage J (1990) Conversation analysis. Annual Review of Anthropology pp 981–987

    Google Scholar 

  25. Hall JA, Coats EJ, LeBeau LS (2005) Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin 131(6):898–924

    Article  Google Scholar 

  26. Hayhoe M, Ballard D (2005) Eye movements in natural behavior. TRENDS in Cognitive Sciences 9(4):188–194

    Article  Google Scholar 

  27. Hung H, Jayagopi D, Yeo C, Friedland G, Ba SO, Odobez JM, Ramchandran K, Mirghafori N, Gatica-Perez D (2007) Using audio and video features to classify the most dominant person in a group meeting. In: Proc. of ACM Multimedia

    Google Scholar 

  28. Hung H, Huang Y, Friedland G, Gatica-Perez D (2008) Estimating the dominant person in multi-party conversations using speaker diarization strategies. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas

    Google Scholar 

  29. Hung H, Jayagopi D, Ba S, Odobez JM, Gatica-Perez D (2008) Investigating automatic dominance estimation in groups from visual attention and speaking activity. in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Chania, October.

    Google Scholar 

  30. Jayagopi D, Hung H, Yeo C, Gatica-Perez D (2009) Modeling dominance in group conversations using nonverbal activity cues. IEEE Trans. on Audio, Speech, and Language Processing, Special Issue on Multimodal Processing for Speech-based Interactions, Vol. 17, No. 3, pp. 501-513. March

    Google Scholar 

  31. Jovanovic N, Op den Akker H (2004) Towards automatic addressee identification in multi-party dialogues. In: 5th SIGdial Workshop on Discourse and Dialogue

    Google Scholar 

  32. Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychologica 26:22–63

    Article  Google Scholar 

  33. Kim T, Chang A, Holland L, Pentland A (2008) Meeting mediator: Enhancing group collaboration with sociometric feedback. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), San Diego

    Google Scholar 

  34. Knapp ML, Hall JA (2005) Nonverbal Communication in Human Interaction. Wadsworth Publishing

    Google Scholar 

  35. Kouadio M, Pooch U (2002) Technology on social issues of videoconferencing on the internet: a survey. Journal of Network and Computer Applications 25:37–56

    Article  Google Scholar 

  36. Kulyk O, Wang J, Terken J (2006) Real-time feedback on nonverbal behaviour to enhance social dynamics in small group meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)

    Google Scholar 

  37. Langton S, Watt R, Bruce V (2000) Do the eyes have it ? cues to the direction of social attention. Trends in Cognitive Sciences 4(2):50–58

    Article  Google Scholar 

  38. Lathoud G (2006) Spatio-temporal analysis of spontaneous speech with microphone arrays. PhD thesis, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

    Google Scholar 

  39. Lathoud G, McCowan I (2003) Location Based Speaker Segmentation. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03), Hong Kong

    Google Scholar 

  40. Matena L, Jaimes A, Popescu-Belis A (2008) Graphical representation of meetings on mobile devices. In: MobileHCI conference, Amsterdam, The Netherlands

    Google Scholar 

  41. Morimoto C, Mimica M (2005) Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding 98:4–24

    Article  Google Scholar 

  42. Novick D, Hansen B, Ward K (1996) Coordinating turn taking with gaze. In: International Conference on Spoken Language Processing

    Google Scholar 

  43. Odobez JM, Ba S (2007) A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of Focus of Attention from Head pose. In: Proc. of ICME

    Google Scholar 

  44. Ohno T (2005) Weak gaze awareness in video-mediated communication. In: Proceedings of Conference on Human Factors in Computing Systems, pp 1709–1712

    Google Scholar 

  45. Otsuka K, Takemae Y, Yamato J, Murase H (2005) A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utterances. In: Proc. of ICMI, pp 191–198

    Google Scholar 

  46. Otsuka K, Yamato J, Takemae Y, Murase H (2006) Conversation scene analysis with dynamic bayesian network based on visual head tracking. In: Proc. of ICME

    Google Scholar 

  47. Otsuka K, Yamato J, Takemae Y, Murase H (2006) Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In: Proc. ACM CHI Extended Abstract, Montreal

    Google Scholar 

  48. Ramírez J, Górriz J, Segura J (2007) Robust speech recognition and understanding, I-Tech, I-Tech Education and Publishing, Vienna, chap Voice activity detection: Fundamentals and speech recognition system robustness

    Google Scholar 

  49. Ranjan A, Birnholtz J, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. In: CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA, pp 227–236, DOI http://doi.acm.org/10.1145/1357054.1357095

    Chapter  Google Scholar 

  50. Rhee HS, Pirkul H, Jacob V, Barhki R (1995) Effects of computer-mediated communication on group negotiation: Au empirical study. In: Proceedings of the 28th Annual Hawaii International Conference on System Sciences, pp 981–987

    Google Scholar 

  51. Rienks R, Heylen D (2005) Automatic dominance detection in meetings using easily detectable features. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh

    Google Scholar 

  52. Rienks R, Zhang D, Gatica-Perez D, Post W (2006) Detection and application of influence rankings in small-group meetings. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Banff

    Google Scholar 

  53. Schmid Mast M (2002) Dominance as expressed and inferred through speaking time: A meta-analysis. Human Communication Research 28(3):420–450

    Google Scholar 

  54. Shriberg E, Stolcke A, Baron D (2001) Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (Prosody 2001)

    Google Scholar 

  55. Stiefelhagen R (2002) Tracking and modeling focus of attention. PhD thesis, University of Karlsruhe

    Google Scholar 

  56. Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans on Neural Networks 13(4):928–938

    Article  Google Scholar 

  57. Sturm J, Herwijnen OHV, Eyck A, Terken J (2007) Influencing social dynamics in meetings through a peripheral display. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Nagoya

    Google Scholar 

  58. Takemae Y, Otsuka K, Yamato J (2005) Automatic video editing system using stereo-based head tracking for multiparty conversation. In: ACM Conference on Human Factors in Computing Systems, pp 1817–1820

    Google Scholar 

  59. Valente F (2006) Infinite models for speaker clustering. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)

    Google Scholar 

  60. Vijayasenan D, Valente F, Bourlard H (2008) Integration of tdoa features in information bottleneck framework for fast speaker diarization. In: Interspeech 2008

    Google Scholar 

  61. Wrigley SJ, Brown GJ, Wan V, Renals S (2005) Speech and crosstalk detection in multi-channel audio. IEEE Trans on Speech and Audio Processing 13:84–91

    Article  Google Scholar 

  62. Yeo C, Ramchandran K (2008) Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Tech. Rep. UCB/EECS-2008-79, EECS Department, University of California, Berkeley

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Daniel Gatica-Perez or Jean-Marc Odobez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Gatica-Perez, D., Odobez, JM. (2010). Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds) Handbook of Ambient Intelligence and Smart Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-93808-0_16

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-93807-3

  • Online ISBN: 978-0-387-93808-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics