Examining modality usage in a conversational multimodal application for mobile e-mail access

Lai, Jennifer; Mitchell, Stella; Pavlovski, Christopher

doi:10.1007/s10772-009-9017-9

Examining modality usage in a conversational multimodal application for mobile e-mail access

Published: 11 April 2009

Volume 10, pages 17–30, (2007)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Jennifer Lai¹,
Stella Mitchell¹ &
Christopher Pavlovski²

62 Accesses
5 Citations
Explore all metrics

Abstract

As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Voice User Interfaces for Interactive TV

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study

References

Baddeley, A. (1992). Working memory. Science, 255, 556–559.
Article Google Scholar
Cohen, M., Giangola, J., & Balogh, J. (2004). Voice user interface design. Reading: Addison-Wesley Professional.
Google Scholar
Cohen, P., McGee, D., & Clow, J. (2000). The efficiency of multimodal interactions for a map-based task. In Proceedings of the sixth conference on applied natural language processing (pp. 331–338) 2000.
Gong, L. (2003). Multimodal interactions on mobile devices and users’ behavioral and attitudinal preferences. In C. Stephanidis (Ed.), Universal access in HCI: inclusive design in the information society (pp. 1402–1406). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Grover, D. L., King, M. T., & Kushler, C. A. (1998). Reduced keyboard disambiguating computer. Tegic Communications, Inc. Seattle. U.S. Patent 5,818,437, 1998.
James, C., & Reischel, K. (2001). Text input for mobile devices: comparing model prediction to actual performance. In Proceedings of the conference on human factors in computing systems (pp. 365–372) 2001.
Jurafsky, D., & Martin, J. (2000). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. New York: Prentice Hall.
Google Scholar
Lai, J. (2004). Facilitating mobile communication with multimodal access to email messages on a cell phone. In Proceedings of the conference on human factors in computing systems—late breaking, CHI 2004.
Lai, J., & Yankelovich, N. (2002). Conversational speech interfaces. In A. Sears, J. Jacko (Eds.), The handbook of human computer interaction. LEA, New Jersey.
Lai, J., Mitchell, S., Viveros, M., Wood, D., & Lee, K. M. (2002). Ubiquitous access to unified messaging: a study of usability, and the limits of pervasive computing. International Journal of Human Computer Interaction, 14(3–4), 335–348.
Google Scholar
Longueuil, D. (2002). Wireless messaging demystified: SMS, EMS, MMS, IM, and other. New York: McGraw-Hill Professional.
Google Scholar
Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual information processing systems in working memory. Journal of Educational Psychology, 90, 312–320.
Article Google Scholar
Oviatt, S. (1996). Multimodal interfaces for dynamic interactive maps. In Proceedings of the conference on human factors in computing systems, CHI ’96 (pp. 95–103) 1996.
Oviatt, S. (1999). Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the conference on human factors in computing systems, CHI ’99, NY (pp. 576–583) 1999.
Oviatt, S. L. (2000). Multimodal signal processing in naturalistic noisy environments. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the international conference of spoken language processing (ICSLP ‘2000) (Vol. 2, pp. 696–699). Beijing: Chinese Friendship Publishers.
Google Scholar
Oviatt, S. (2003). Multimodal interfaces. In J. Jacko & A. Sears (Eds.), The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications (Chap. 14, pp. 286–304). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Pavlovski, C., Lai, J., & Mitchell, S. (2004). Etiology of user experience with natural language speech. In ICSLP 2004.
Ruuska, P., & Frantti, T. (2001). The multicall service to support multimedia services in the UMTS networks. In Proceedings of the 27th Euromicro conference, Warsaw, Poland, 2001.
Salonisdis, T., & Digalakis, V. (1998). Robust speech recognition for multiple topological scenarios of the GSM Mobile phone system. In International conference in acoustics and signal processing (ICASSP), Seattle, USA, 12–15 May 1998.
Sawhney, N., & Schmandt, C. (2000). Nomadic radio. ACM Transactions on Computer-Human Interaction, 7(3).
Wickens, C., Sandry, D., & Vidulich, M. (1983). Compatibility and resource competition between modalities of input, central processing, and output. Human Factors, 25, 227–248.
Google Scholar
W3C Note 8. (2003). Multimodal interaction requirements. http://www.w3.org/TR/mmireqs.
Yankelovich, N., Levow, G., & Marx, M. (1995). Designing SpeechActs: issues in speech user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 369–376), Denver, Colorado, United States, 07–11 May, 1995.

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532, USA
Jennifer Lai & Stella Mitchell
IBM Corporation, 348 Edward Street, Brisbane, QLD, 4000, Australia
Christopher Pavlovski

Authors

Jennifer Lai
View author publications
You can also search for this author in PubMed Google Scholar
Stella Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Pavlovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jennifer Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, J., Mitchell, S. & Pavlovski, C. Examining modality usage in a conversational multimodal application for mobile e-mail access. Int J Speech Technol 10, 17–30 (2007). https://doi.org/10.1007/s10772-009-9017-9

Download citation

Received: 06 December 2005
Accepted: 12 January 2009
Published: 11 April 2009
Issue Date: March 2007
DOI: https://doi.org/10.1007/s10772-009-9017-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Examining modality usage in a conversational multimodal application for mobile e-mail access

Abstract

Access this article

Similar content being viewed by others

A Review of Voice User Interfaces for Interactive TV

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Examining modality usage in a conversational multimodal application for mobile e-mail access

Abstract

Access this article

Similar content being viewed by others

A Review of Voice User Interfaces for Interactive TV

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

A Speech-to-Speech, Machine Translation Mediated Map Task: An Exploratory Study

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation