Skip to main content
Log in

The MSIIA Experiment: Using Speech to Enhance Human Performance on a Cognitive Task

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

We performed an exploratory study to examine the effects of speech-enabled input on a cognitive task involving analysis and annotation of objects in aerial reconnaissance videos. We added speech to an information fusion system to allow for hands-free annotation in order to examine the effect on efficiency, quality, task success, and user satisfaction. We hypothesized that speech recognition could be a cognitive-enabling technology by reducing the mental load of instrument manipulation and freeing up resources for the task at hand.

Despite the lack of confidence participants had for the accuracy and temporal precision of the speech-enabled input, each reported that speech made it easier and faster to annotate images. When speech input was available, participants chose speech over manual input to make all annotations. Several participants noted that the additional modality was very effective in reducing the necessity to navigate controls and in allowing them to focus more on the task. Quantitative results suggest that people could potentially identify images faster with speech. However, people did not annotate better with speech (precision was lower, and recall was significantly lower). We attribute the lower recall/precision scores to the lack of undo and editing capabilities and insufficient experience by naïve users in an unfamiliar domain.

This formative study has provided feedback for further development of the system augmented with speech-enabled input, as our results show that the availability of speech may lead to improved performance of expert domain users on more complicated tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Christian, K., Kules, B., Shneiderman, B., and Youssef, A. (2000). A comparison of voice controlled and mouse controlled web browsing. Proceedings of the Fourth International ACM Conference on Assistive Technologies, Arlington, VA, pp. 72-79.

  • Cohen, P. (1992). The role of natural language in a multimodal interface. Proceedings of the Fifth Annual ACM Symposium on User Interface Software and Technology: UIST'92, Monterey, CA, pp. 143-149.

  • DARPA Augmented Cognition Program (2001). Available at http://www.darpa.mil/ito/research/ac/index.html.

  • Grasso, M., Ebert, D., and Finin, T. (1998). The integrality of speech in multimodal interfaces. ACMTransactions on Computer-Human Interaction, 5(4):303-325.

    Google Scholar 

  • Hansen, S. (1997). MSIIA hunts predator in Bosnia. The Edge, The MITRE Advanced Technology Newsletter, 1(1). Available at http://www.mitre.org/pubs/edge/march 97/second.htm.

  • Karat, C.-M., Halverson, C., Horn, D., and Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. Proceedings of Conference on Human Factors in Computing Systems: CHI'99, Pittsburgh, PA, pp. 568-575.

  • Mellor, B., Baber, C., and Tunley, C. (1996). Evaluating automatic speech recognition as a component of a multi-input device humancomputer interface. Proceedings of the Fourth International Conference on Spoken Language Processing: ICSLP 96, Philadelphia, PA, pp. 1668-1671.

  • Nuance Communications Inc. (2002). Available at http://www. nuance.com.

  • Oviatt, S. (1994). Interface techniques for minimizing disfluent input to spoken language systems. Proceedings of Conference on Human Factors in Computing Systems: CHI '94, Boston, pp. 205-210.

  • Oviatt, S. (1996). Multimodal interfaces for dynamic interactive maps. Proceedings of Conference on Human Factors in Computing Systems: CHI '96, New York, pp. 95-102.

  • Oviatt, S. (2000). Taming recognition errors with a multimodal interface. Communications of the ACM, 43(9):45-51.

    Google Scholar 

  • Oviatt, S. and Cohen, P. (2000). Multimodal interfaces that process what comes naturally. Communications of the ACM, 43(3):45-53.

    Google Scholar 

  • Polson, M. and Friedman, A. (1988). Task-sharing within and between hemispheres: A multiple-resources approach. Human Factors, 30:633-643.

    Google Scholar 

  • Rosenfeld, R., Olsen, D., and Rudnicky, A. (2001). Universal speech interfaces. Interactions, pp. 34-44.

  • Rudnicky, A. (1993). Mode preference in a simple data-retrieval task. INTERACT'93 and CHI'93 Conference Companion on HumanFactors in Computing Systems, Amsterdam, The Netherlands, pp. 71-72.

  • Rudnicky, A., Hauptmann, A., and Lee, K. (1994). Survey of current speech technology. Communications of the ACM 37(3).

  • Shneiderman, B. (1983). Direct manipulation: A step beyond programming languages. Computer, 16(8):57-69.

    Google Scholar 

  • Shneiderman, B. (2000). The limits of speech recognition. Communications of the ACM, 43(9):63-65.

    Google Scholar 

  • Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, B., Polifroni, J., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., and Whittaker, S. (2001). DARPA communicator dialog travel planning systems: The June 2000 data collection. Proceedings of EUROSPEECH 2001, Aalborg, Denmark, pp. 1371-1376.

  • Waterworth, J. and Talbot, M. (1987). Speech and Language-Based Interaction with Machines.NewYork: JohnWiley&Sons, pp. 54-56.

    Google Scholar 

  • Wickens, C. and Hollands, J. (1988). Engineering Psychology and Human Performance. 3rd ed., Eaglewood Cliffs, NJ: Prentice-Hall, p. 451.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Damianos, L., Loehr, D., Burke, C. et al. The MSIIA Experiment: Using Speech to Enhance Human Performance on a Cognitive Task. International Journal of Speech Technology 6, 133–144 (2003). https://doi.org/10.1023/A:1022334530417

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022334530417

Navigation