research-article

More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings

Authors:
Moira McGregor

Stockholm University & Microsoft, Stockholm, Sweden

Stockholm University & Microsoft, Stockholm, Sweden
View Profile

,
John C. Tang

Microsoft Research, Mountain View, CA, USA

Microsoft Research, Mountain View, CA, USA
View Profile

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social ComputingFebruary 2017Pages 2208–2220https://doi.org/10.1145/2998181.2998335

Published:25 February 2017Publication History

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing

Pages 2208–2220

ABSTRACT

Personal assistants using a command-dialogue model of speech recognition, such as Siri and Cortana, have become increasingly powerful and popular for individual use. In this paper we explore whether similar techniques could be used to create a speech-based agent system which, in a group meeting setting, would similarly monitor spoken dialogue, pro-actively detect useful actions, and carry out those actions without specific commands being spoken. Using a low-fi technical probe, we investigated how such a system might perform in the collaborative work setting and how users might respond to it. We recorded and transcribed a varied set of nine meetings from which we generated simulated lists of automated 'action items', which we then asked the meeting participants to review retrospectively. The low rankings given on these discovered items are suggestive of the difficulty in applying personal assistant technology to the group setting, and we document the issues emerging from the study. Through observations, we explored the nature of meetings and the challenges they present for speech agents.

References

J. O. Angouri and Meredith Marra. 2010. Corporate meetings as genre: a study of the role of the chair in corporate meeting talk. Text & talk 30, 6: 615--636.Google Scholar
Satanjeev Banerjee, Carolyn Rose, and Alexander I. Rudnicky. 2005. The necessity of a meeting recording and playback system, and the benefit of topic-level annotations to meeting browsing. In Human-Computer Interaction-INTERACT 2005. Springer, 643--656. http://link.springer.com/chapter/10.1007/11555261_52 Google ScholarDigital Library
Deirdre Boden. 1994. Business of Talk. Wiley.Google Scholar
Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI Interprets the Probes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07), 1077--1086. Google ScholarDigital Library
Hsinchun Chen, A. Houston, J. Nunamaker, and J. Yen. 1996. Toward intelligent meeting agents. Computer 29, 8: 62--70. Google ScholarDigital Library
Yun-Nung Chen, Dilek Hakkani-Tür, and Xiaodong He. 2015. Detecting actionable items in meetings by convolutional deep structured semantic models. In Proceedings of ASRU.Google ScholarCross Ref
Brendon Clark. 2016. One-Shot Video | Interactive Institute. https://www.tii.se/one-shot-videoGoogle Scholar
A. H. M. Cremers, B. Hilhorst, and APOS Vermeeren. 2005. What was discussed by whom, how, when and where? Personalized browsing of annotated multimedia meeting recordings. Proceedings of HCI: 1--10. http://scholar.google.com/scholar?cluster=12768619403359757807&hl=en&oi=scholarrGoogle Scholar
Richard L. Daft and Robert H. Lengel. 1983. Information Richness. A New Approach to Managerial Behavior and Organization Design.Google Scholar
Patrick Ehlen, Matthew Purver, John Niekrasz, Kari Lee, and Stanley Peters. 2008. Meeting Adjourned: Off-line Learning Interfaces for Automatic Meeting Understanding. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI '08), 276--284. Google ScholarDigital Library
Michel Galley, Kathleen McKeown, Julia Hirschberg, and Elizabeth Shriberg. 2004. Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics (ACL '04). Google ScholarDigital Library
Werner Geyer, Heather Richter, and Gregory D. Abowd. 2005. Towards a Smarter Meeting Record-Capture and Access of Meetings Revisited. Multimedia Tools and Applications 27, 3: 393--410. Google ScholarDigital Library
Walter A. Green and Harold Lazarus. 1991. Are Today's Executives Meeting with Success? Journal of Management Development 10, 1: 14--25.Google ScholarCross Ref
S.W. Hamerich. 2007. Towards advanced speech driven navigation systems for cars. 247--250.Google Scholar
Richard Harper. 2010. Texture: Human Expression in the Age of Communications Overload. The MIT Press. http://dl.acm.org/citation.cfm?id=1941863 Google ScholarCross Ref
Hartmut Helmke, Jürgen Rataj, Thorsten Mühlhausen, Oliver Ohneiser, Heiko Ehr, Matthias Kleinert, Y. Oualil, and M. Schulder. 2015. Assistant-based speech recognition for ATM applications. In Eleventh USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal. http://www.atmseminar.org/seminarContent/seminar11/papers/363_Helmke_0120151059-Final-Paper-4-2815.pdfGoogle Scholar
Pei-Yun Hsueh and Johanna Moore. 2007. What decisions have you made: Automatic decision detection in conversational speech. In In NAACL/HLT. http://www.research.ed.ac.uk/portal/files/7771732/N07_1004.pdfGoogle Scholar
Pei-Yun Hsueh and Johanna D. Moore. 2009. Improving Meeting Summarization by Focusing on User Needs: A Task-oriented Evaluation. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09), 17--26. Google ScholarDigital Library
Vaiva Kalnikaite, Patrick Ehlen, and Steve Whittaker. 2012. Markup as you talk: establishing effective memory cues while still contributing to a meeting. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, 349--358. Google ScholarDigital Library
Fawzia Khan. 1993. A survey of note-taking practices. Hewlett-Packard Laboratories.Google Scholar
Alison Kidd. 1994. The marks are on the knowledge worker. In Proceedings of the SIGCHI conference on Human factors in computing systems, 186--191. Google ScholarDigital Library
Stefan Kopp, Lars Gesellensetter, Nicole C. Krämer, and Ipke Wachsmuth. 2005. A conversational agent as museum guide-design and evaluation of a real-world application. In International Workshop on Intelligent Virtual Agents, 329--343. Google ScholarDigital Library
Agnes Lisowska, Andrei Popescu-Belis, and Susan Armstrong. 2004. User query analysis for the specification and evaluation of a dialogue processing and retrieval system. http://archiveouverte.unige.ch/unige:2264Google Scholar
Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), 5286--5297. Google ScholarDigital Library
Donald McMillan, Antoine Loriette, and Barry Brown. 2015. Repurposing Conversation: Experiments with the Continuous Speech Stream. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3953--3962. Google ScholarDigital Library
Robinson Meyer. 2015. Even Early Focus Groups Hated Clippy. The Atlantic. http://www.theatlantic.com/technology/archive/2015/0 6/clippy-the-microsoft-office-assistant-is-thepatriarchys-fault/396653/Google Scholar
Henry Mintzberg. 1975. The manager's job: folklore and fact. Harvard Business Review 53, 4: 49--61. https://ezp.sub.su.se/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=3867274&site=ehost-live&scope=siteGoogle Scholar
Roger K. Moore. 2013. Spoken language processing: where do we go from here? In Your Virtual Butler, Robert Trappl (ed.). Springer-Verlag, Berlin, Heidelberg, 119--133. http://dl.acm.org/citation.cfm?id=2554494.2554508 Google ScholarDigital Library
Gabriel Murray and Steve Renals. 2008. Detecting action items in meetings. In Machine Learning for Multimodal Interaction. Springer, 208--213. http://link.springer.com/chapter/10.1007/978-3-54085853-9_19 Google ScholarDigital Library
Gabriel Murray and Steve Renals. 2008. Detecting Action Items in Meetings. In Machine Learning for Multimodal Interaction, Andrei Popescu-Belis and Rainer Stiefelhagen (eds.). Springer Berlin Heidelberg, 208--213. Google ScholarDigital Library
Mukesh Nathan, Mercan Topkara, Jennifer Lai, Shimei Pan, Steven Wood, Jeff Boston, and Loren Terveen. 2012. In Case You Missed It: Benefits of Attendeeshared Annotations for Non-attendees of Remote Meetings. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12), 339--348. Google ScholarDigital Library
Stephan Raaijmakers, Khiet Truong, and Theresa Wilson. 2008. Multimodal Subjectivity Analysis of Multiparty Conversation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), 466--474. Google ScholarDigital Library
Felix Stalder and Christine Mayer. 2009. The Second Index. Search Engines, Personalization and Surveillance (Deep Search) | n.n. -- notes & nodes on society, technology and the space of the possible. http://felix.openflows.com/node/113Google Scholar
Phil Thompson, Anne James, and Antonios Nanos. 2013. V-ROOM: Virtual meeting system trial. 563--569.Google Scholar
David Traum, Priti Aggarwal, Ron Artstein, Susan Foutz, Jillian Gerten, Athanasios Katsamanis, Anton Leuski, Dan Noren, and William Swartout. 2012. Ada and Grace: Direct interaction with museum visitors. In Intelligent Virtual Agents, 245--251. Google ScholarDigital Library
Simon Tucker, Ofer Bergman, Anand Ramamoorthy, and Steve Whittaker. 2010. Catchup: a useful application of time-travel in meetings. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, 99--102. Google ScholarDigital Library
Stephen Viller. 1991. The Group Facilitator: A CSCW Perspective. 81--95.Google Scholar
Steve Whittaker, Rachel Laban, and Simon Tucker. 2006. Analysing Meeting Records: An Ethnographic Study and Technological Implications. In Machine Learning for Multimodal Interaction, Steve Renals and Samy Bengio (eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 101--113. http://link.springer.com/10.1007/11677482_9 Google ScholarDigital Library
Ramin Yaghoubzadeh, Marcel Kramer, Karola Pitsch, and Stefan Kopp. 2013. Virtual agents as daily assistants for elderly or cognitively impaired people. In Intelligent virtual agents, 79--91.Google Scholar
Julián Zapata and Andreas Søeborg Kirkedal. 2015. Assessing the Performance of Automatic Speech Recognition Systems When Used by Native and NonNative Speakers of Three Major Languages in Dictation Workflows. In Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania, 201--210.Google Scholar

Index Terms

More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Traditional interfaces are continuously being replaced by mobile, wearable, or pervasive interfaces. Yet when it comes to the input and output modalities enabling our interactions, we have yet to fully embrace some of the most natural forms of ...
Read More
Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility

Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study ...
Read More
Speech-based interaction: myths, challenges, and opportunities
CHI EA '14: CHI '14 Extended Abstracts on Human Factors in Computing Systems

HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
February 2017
2556 pages
ISBN:9781450343350
DOI:10.1145/2998181
General Chairs:
Charlotte P. Lee
University of Washington
,
Steve Poltrock
Retired
,
Program Chairs:
Louise Barkhuus
Stockholm University and Cornell Tech
,
Marcos Borges
Universidade Federal do Rio de Janeiro
,
Wendy Kellogg
IBM T.J. Watson Research Center
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic speech recognition
collaborative workplace technology
meeting agents
speech interaction
Qualifiers
- research-article
Conference

Acceptance Rates
CSCW '17 Paper Acceptance Rate183of530submissions,35%Overall Acceptance Rate2,235of8,521submissions,26%
More
Upcoming Conference
CSCW '24

Sponsor:

sigchi

CSCW '24: Computer-Supported Cooperative Work and Social Computing

November 9 - 13, 2024

San Jose , Costa Rica
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 817
  Total Downloads
- Downloads (Last 12 months)75
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

Speech-based interaction: myths, challenges, and opportunities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media