Skip to main content
Log in

Multimodal Interfaces for Cell Phones and Mobile Technology

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

By modeling users' natural spoken and multimodal communication patterns, more powerful and highly reliable interfaces can be designed that support emerging mobile technology. In this paper, we highlight three different examples of research that is advancing state-of-the-art mobile technology. The first is the development of fusion-based multimodal systems, such as ones that combine speech and pen or touch input, which are substantially improving the robustness and stability of system recognition. The second is modeling of multimodal communication patterns to establish open-microphone engagement techniques that work in challenging multi-person mobile settings. The third is new approaches to adaptive processing, which are able to transparently guide user input to match system processing capabilities. All three research directions are contributing to the design of more reliable, usable, and commercially promising mobile systems of the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cohen, M.H., Giangola, J.P., and Balogh, J. (2004). Voice User Interface Design. San Franciso, Ca: Addison-Wesley.

    Google Scholar 

  • Darves, C. and Oviatt, S. (2004). Talking to digital fish: Designing effective conversational interfaces for educational software. In Z. Ruttkay and C. Pelachaud (Eds.), From Brows to Trust: Evaluating Embodied Conversational Agents. Dordrecht: Kluwer, pp. 271–292.

    Google Scholar 

  • Katzenmaier, M., Steifelhagen, R., and Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In Proc. ICMI, pp. 144–151.

  • Lunsford, R., Oviatt, S., and Coulston, R. (Submission). Audio-visual cues distinguishing self- from system-directed speech in younger and older adults.

  • Neti, C., Iyengar, G., Potamianos, G., Senior, A., and Maison, B. (2000). Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. In Proc. ICSLP, pp. 11–14.

  • Oulasvirta, A., Tamminen, S., Roto, V., and Kuorelahti, J. (2005). Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'05), CHI Letters. New York, N.Y.: ACM Press, to appear.

  • Oviatt, S.L. (1999). Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'99), CHI Letters. New York, N.Y.: ACM Press, pp. 576–583.

  • Oviatt, S.L. (2000). Multimodal system processing in mobile environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST'00), CHI Letters. New York, N.Y.: ACM, pp. 21–30.

  • Oviatt, S.L. (2002). Breaking the robustness barrier: Recent progress on the design of robust multimodal systems. In M. Zelkowitz (ed.), Advances in Computers. Academic Press, vol. 56, pp. 305–341.

  • Oviatt, S.L. (2003). Multimodal interfaces. In J. Jacko and A. Sears (Eds.), Handbook of Human-Computer Interaction, Lawrence Erlbaum Assoc: Mahwah, New Jersey, chap. 14, pp. 286–304.

    Google Scholar 

  • Oviatt, S.L., Coulston, R., and Lunsford, R. (2004). When do we interact multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the Sixth International Conference on Multimodal Interfaces (ICMI'04), pp. 129–136.

  • Oviatt, S.L., Darves, C., and Coulston, R. (2004). Toward adaptive conversational interfaces: Modeling speech convergence with animated personas. Transactions on Human Computer Interaction (TOCHI), 11(3):300–328 (special issue on “Mobile and Adaptive Conversational Interfaces”).

    Google Scholar 

  • Potamianos, G., Neti, C., Luettin, J., and Matthews, I. (2004). Audio-visual automatic speech recognition: An overview. In G. Bailly, E. Vatikiotis-Bateson, and P. Perrier (Eds.), Issues in Visual and Audio-Visual Speech Processing. Cambridge: MIT Press.

    Google Scholar 

  • Xiao, B., Lunsford, R., Coulston, R., Wesson, R., and Oviatt, S.L. (2003). Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In Proceedings of the International Conference on Multimodal Interfaces (ICMI'03). N.Y.: ACM Press, pp. 265–272.

  • Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies, 34:527–547.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharon Oviatt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oviatt, S., Lunsford, R. Multimodal Interfaces for Cell Phones and Mobile Technology. Int J Speech Technol 8, 127–132 (2005). https://doi.org/10.1007/s10772-005-2164-8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-005-2164-8

Keywords

Navigation