ABSTRACT
Personal assistants using a command-dialogue model of speech recognition, such as Siri and Cortana, have become increasingly powerful and popular for individual use. In this paper we explore whether similar techniques could be used to create a speech-based agent system which, in a group meeting setting, would similarly monitor spoken dialogue, pro-actively detect useful actions, and carry out those actions without specific commands being spoken. Using a low-fi technical probe, we investigated how such a system might perform in the collaborative work setting and how users might respond to it. We recorded and transcribed a varied set of nine meetings from which we generated simulated lists of automated 'action items', which we then asked the meeting participants to review retrospectively. The low rankings given on these discovered items are suggestive of the difficulty in applying personal assistant technology to the group setting, and we document the issues emerging from the study. Through observations, we explored the nature of meetings and the challenges they present for speech agents.
- J. O. Angouri and Meredith Marra. 2010. Corporate meetings as genre: a study of the role of the chair in corporate meeting talk. Text & talk 30, 6: 615--636.Google Scholar
- Satanjeev Banerjee, Carolyn Rose, and Alexander I. Rudnicky. 2005. The necessity of a meeting recording and playback system, and the benefit of topic-level annotations to meeting browsing. In Human-Computer Interaction-INTERACT 2005. Springer, 643--656. http://link.springer.com/chapter/10.1007/11555261_52 Google ScholarDigital Library
- Deirdre Boden. 1994. Business of Talk. Wiley.Google Scholar
- Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI Interprets the Probes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07), 1077--1086. Google ScholarDigital Library
- Hsinchun Chen, A. Houston, J. Nunamaker, and J. Yen. 1996. Toward intelligent meeting agents. Computer 29, 8: 62--70. Google ScholarDigital Library
- Yun-Nung Chen, Dilek Hakkani-Tür, and Xiaodong He. 2015. Detecting actionable items in meetings by convolutional deep structured semantic models. In Proceedings of ASRU.Google ScholarCross Ref
- Brendon Clark. 2016. One-Shot Video | Interactive Institute. https://www.tii.se/one-shot-videoGoogle Scholar
- A. H. M. Cremers, B. Hilhorst, and APOS Vermeeren. 2005. What was discussed by whom, how, when and where? Personalized browsing of annotated multimedia meeting recordings. Proceedings of HCI: 1--10. http://scholar.google.com/scholar?cluster=12768619403359757807&hl=en&oi=scholarrGoogle Scholar
- Richard L. Daft and Robert H. Lengel. 1983. Information Richness. A New Approach to Managerial Behavior and Organization Design.Google Scholar
- Patrick Ehlen, Matthew Purver, John Niekrasz, Kari Lee, and Stanley Peters. 2008. Meeting Adjourned: Off-line Learning Interfaces for Automatic Meeting Understanding. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI '08), 276--284. Google ScholarDigital Library
- Michel Galley, Kathleen McKeown, Julia Hirschberg, and Elizabeth Shriberg. 2004. Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics (ACL '04). Google ScholarDigital Library
- Werner Geyer, Heather Richter, and Gregory D. Abowd. 2005. Towards a Smarter Meeting Record-Capture and Access of Meetings Revisited. Multimedia Tools and Applications 27, 3: 393--410. Google ScholarDigital Library
- Walter A. Green and Harold Lazarus. 1991. Are Today's Executives Meeting with Success? Journal of Management Development 10, 1: 14--25.Google ScholarCross Ref
- S.W. Hamerich. 2007. Towards advanced speech driven navigation systems for cars. 247--250.Google Scholar
- Richard Harper. 2010. Texture: Human Expression in the Age of Communications Overload. The MIT Press. http://dl.acm.org/citation.cfm?id=1941863 Google ScholarCross Ref
- Hartmut Helmke, Jürgen Rataj, Thorsten Mühlhausen, Oliver Ohneiser, Heiko Ehr, Matthias Kleinert, Y. Oualil, and M. Schulder. 2015. Assistant-based speech recognition for ATM applications. In Eleventh USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal. http://www.atmseminar.org/seminarContent/seminar11/papers/363_Helmke_0120151059-Final-Paper-4-2815.pdfGoogle Scholar
- Pei-Yun Hsueh and Johanna Moore. 2007. What decisions have you made: Automatic decision detection in conversational speech. In In NAACL/HLT. http://www.research.ed.ac.uk/portal/files/7771732/N07_1004.pdfGoogle Scholar
- Pei-Yun Hsueh and Johanna D. Moore. 2009. Improving Meeting Summarization by Focusing on User Needs: A Task-oriented Evaluation. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09), 17--26. Google ScholarDigital Library
- Vaiva Kalnikaite, Patrick Ehlen, and Steve Whittaker. 2012. Markup as you talk: establishing effective memory cues while still contributing to a meeting. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, 349--358. Google ScholarDigital Library
- Fawzia Khan. 1993. A survey of note-taking practices. Hewlett-Packard Laboratories.Google Scholar
- Alison Kidd. 1994. The marks are on the knowledge worker. In Proceedings of the SIGCHI conference on Human factors in computing systems, 186--191. Google ScholarDigital Library
- Stefan Kopp, Lars Gesellensetter, Nicole C. Krämer, and Ipke Wachsmuth. 2005. A conversational agent as museum guide-design and evaluation of a real-world application. In International Workshop on Intelligent Virtual Agents, 329--343. Google ScholarDigital Library
- Agnes Lisowska, Andrei Popescu-Belis, and Susan Armstrong. 2004. User query analysis for the specification and evaluation of a dialogue processing and retrieval system. http://archiveouverte.unige.ch/unige:2264Google Scholar
- Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), 5286--5297. Google ScholarDigital Library
- Donald McMillan, Antoine Loriette, and Barry Brown. 2015. Repurposing Conversation: Experiments with the Continuous Speech Stream. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3953--3962. Google ScholarDigital Library
- Robinson Meyer. 2015. Even Early Focus Groups Hated Clippy. The Atlantic. http://www.theatlantic.com/technology/archive/2015/0 6/clippy-the-microsoft-office-assistant-is-thepatriarchys-fault/396653/Google Scholar
- Henry Mintzberg. 1975. The manager's job: folklore and fact. Harvard Business Review 53, 4: 49--61. https://ezp.sub.su.se/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=3867274&site=ehost-live&scope=siteGoogle Scholar
- Roger K. Moore. 2013. Spoken language processing: where do we go from here? In Your Virtual Butler, Robert Trappl (ed.). Springer-Verlag, Berlin, Heidelberg, 119--133. http://dl.acm.org/citation.cfm?id=2554494.2554508 Google ScholarDigital Library
- Gabriel Murray and Steve Renals. 2008. Detecting action items in meetings. In Machine Learning for Multimodal Interaction. Springer, 208--213. http://link.springer.com/chapter/10.1007/978-3-54085853-9_19 Google ScholarDigital Library
- Gabriel Murray and Steve Renals. 2008. Detecting Action Items in Meetings. In Machine Learning for Multimodal Interaction, Andrei Popescu-Belis and Rainer Stiefelhagen (eds.). Springer Berlin Heidelberg, 208--213. Google ScholarDigital Library
- Mukesh Nathan, Mercan Topkara, Jennifer Lai, Shimei Pan, Steven Wood, Jeff Boston, and Loren Terveen. 2012. In Case You Missed It: Benefits of Attendeeshared Annotations for Non-attendees of Remote Meetings. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12), 339--348. Google ScholarDigital Library
- Stephan Raaijmakers, Khiet Truong, and Theresa Wilson. 2008. Multimodal Subjectivity Analysis of Multiparty Conversation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), 466--474. Google ScholarDigital Library
- Felix Stalder and Christine Mayer. 2009. The Second Index. Search Engines, Personalization and Surveillance (Deep Search) | n.n. -- notes & nodes on society, technology and the space of the possible. http://felix.openflows.com/node/113Google Scholar
- Phil Thompson, Anne James, and Antonios Nanos. 2013. V-ROOM: Virtual meeting system trial. 563--569.Google Scholar
- David Traum, Priti Aggarwal, Ron Artstein, Susan Foutz, Jillian Gerten, Athanasios Katsamanis, Anton Leuski, Dan Noren, and William Swartout. 2012. Ada and Grace: Direct interaction with museum visitors. In Intelligent Virtual Agents, 245--251. Google ScholarDigital Library
- Simon Tucker, Ofer Bergman, Anand Ramamoorthy, and Steve Whittaker. 2010. Catchup: a useful application of time-travel in meetings. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, 99--102. Google ScholarDigital Library
- Stephen Viller. 1991. The Group Facilitator: A CSCW Perspective. 81--95.Google Scholar
- Steve Whittaker, Rachel Laban, and Simon Tucker. 2006. Analysing Meeting Records: An Ethnographic Study and Technological Implications. In Machine Learning for Multimodal Interaction, Steve Renals and Samy Bengio (eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 101--113. http://link.springer.com/10.1007/11677482_9 Google ScholarDigital Library
- Ramin Yaghoubzadeh, Marcel Kramer, Karola Pitsch, and Stefan Kopp. 2013. Virtual agents as daily assistants for elderly or cognitively impaired people. In Intelligent virtual agents, 79--91.Google Scholar
- Julián Zapata and Andreas Søeborg Kirkedal. 2015. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and NonNative Speakers of Three Major Languages in Dictation Workflows. In Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania, 201--210.Google Scholar
Index Terms
- More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings
Recommendations
Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing SystemsTraditional interfaces are continuously being replaced by mobile, wearable, or pervasive interfaces. Yet when it comes to the input and output modalities enabling our interactions, we have yet to fully embrace some of the most natural forms of ...
Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and AccessibilityDeaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study ...
Speech-based interaction: myths, challenges, and opportunities
CHI EA '14: CHI '14 Extended Abstracts on Human Factors in Computing SystemsHCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be ...
Comments