Skip to main content
Log in

Examining modality usage in a conversational multimodal application for mobile e-mail access

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baddeley, A. (1992). Working memory. Science, 255, 556–559.

    Article  Google Scholar 

  • Cohen, M., Giangola, J., & Balogh, J. (2004). Voice user interface design. Reading: Addison-Wesley Professional.

    Google Scholar 

  • Cohen, P., McGee, D., & Clow, J. (2000). The efficiency of multimodal interactions for a map-based task. In Proceedings of the sixth conference on applied natural language processing (pp. 331–338) 2000.

  • Gong, L. (2003). Multimodal interactions on mobile devices and users’ behavioral and attitudinal preferences. In C. Stephanidis (Ed.), Universal access in HCI: inclusive design in the information society (pp. 1402–1406). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Grover, D. L., King, M. T., & Kushler, C. A. (1998). Reduced keyboard disambiguating computer. Tegic Communications, Inc. Seattle. U.S. Patent 5,818,437, 1998.

  • James, C., & Reischel, K. (2001). Text input for mobile devices: comparing model prediction to actual performance. In Proceedings of the conference on human factors in computing systems (pp. 365–372) 2001.

  • Jurafsky, D., & Martin, J. (2000). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. New York: Prentice Hall.

    Google Scholar 

  • Lai, J. (2004). Facilitating mobile communication with multimodal access to email messages on a cell phone. In Proceedings of the conference on human factors in computing systems—late breaking, CHI 2004.

  • Lai, J., & Yankelovich, N. (2002). Conversational speech interfaces. In A. Sears, J. Jacko (Eds.), The handbook of human computer interaction. LEA, New Jersey.

  • Lai, J., Mitchell, S., Viveros, M., Wood, D., & Lee, K. M. (2002). Ubiquitous access to unified messaging: a study of usability, and the limits of pervasive computing. International Journal of Human Computer Interaction, 14(3–4), 335–348.

    Google Scholar 

  • Longueuil, D. (2002). Wireless messaging demystified: SMS, EMS, MMS, IM, and other. New York: McGraw-Hill Professional.

    Google Scholar 

  • Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual information processing systems in working memory. Journal of Educational Psychology, 90, 312–320.

    Article  Google Scholar 

  • Oviatt, S. (1996). Multimodal interfaces for dynamic interactive maps. In Proceedings of the conference on human factors in computing systems, CHI ’96 (pp. 95–103) 1996.

  • Oviatt, S. (1999). Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the conference on human factors in computing systems, CHI ’99, NY (pp. 576–583) 1999.

  • Oviatt, S. L. (2000). Multimodal signal processing in naturalistic noisy environments. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the international conference of spoken language processing (ICSLP ‘2000) (Vol. 2, pp. 696–699). Beijing: Chinese Friendship Publishers.

    Google Scholar 

  • Oviatt, S. (2003). Multimodal interfaces. In J. Jacko & A. Sears (Eds.), The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications (Chap. 14, pp. 286–304). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Pavlovski, C., Lai, J., & Mitchell, S. (2004). Etiology of user experience with natural language speech. In ICSLP 2004.

  • Ruuska, P., & Frantti, T. (2001). The multicall service to support multimedia services in the UMTS networks. In Proceedings of the 27th Euromicro conference, Warsaw, Poland, 2001.

  • Salonisdis, T., & Digalakis, V. (1998). Robust speech recognition for multiple topological scenarios of the GSM Mobile phone system. In International conference in acoustics and signal processing (ICASSP), Seattle, USA, 12–15 May 1998.

  • Sawhney, N., & Schmandt, C. (2000). Nomadic radio. ACM Transactions on Computer-Human Interaction, 7(3).

  • Wickens, C., Sandry, D., & Vidulich, M. (1983). Compatibility and resource competition between modalities of input, central processing, and output. Human Factors, 25, 227–248.

    Google Scholar 

  • W3C Note 8. (2003). Multimodal interaction requirements. http://www.w3.org/TR/mmireqs.

  • Yankelovich, N., Levow, G., & Marx, M. (1995). Designing SpeechActs: issues in speech user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 369–376), Denver, Colorado, United States, 07–11 May, 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jennifer Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, J., Mitchell, S. & Pavlovski, C. Examining modality usage in a conversational multimodal application for mobile e-mail access. Int J Speech Technol 10, 17–30 (2007). https://doi.org/10.1007/s10772-009-9017-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9017-9

Keywords

Navigation