Skip to main content
Log in

VoiceMarks: restructuring hierarchical voice menus for improving navigation

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Interactive Voice Response (IVR) systems, or touch-tone telephony interfaces, are nowadays a common medium of interaction between organizations or companies and their customers, allowing users to access or enter specific company-based information. These telephony interfaces typically involve the use of hierarchically structured voice menus, through which a user has to navigate in order to locate a specific desired menu item. This navigation process is often inefficient and time-consuming, leaving users at times frustrated and annoyed. In this paper, we describe the foundation of VoiceMarks, a system designed to improve the ease and efficiency of navigation in menu-based voice interfaces. The system features personalized menus through the use of voicemarks, in a process similar to bookmarking, but adapted to voice interfaces. VoiceMarks are essentially bookmarked nodes in the voice menu hierarchy, which are stored for the respective user in a directly accessible, personal menu. We developed and tested VoiceMarks interfaces for two applications: a bus schedule information system and a cinema ticket purchase system. A comparative study of VoiceMarks and traditional interfaces of these applications showed that VoiceMarks can significantly improve the interaction between users and systems, in terms of time and number of keystrokes needed to locate a menu item, as well as regarding user satisfaction. In general, users responded very positively to the VoiceMarks interface. In addition, the study pointed to some useful modifications of VoiceMarks, which should be considered before employing the system in a commercial setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baecker, R., Booth, K. S., Jovivic, S., McGrenere, J., & Moore, G. (2000). Reducing the gap between what users know and what they need to know. In Proceedings of the conference on universal usability (pp. 17–23). New York: ACM Press.

    Chapter  Google Scholar 

  • Balentine, B. (1999). Re-Engineering the speech menu. In D. Gardner-Bonneau (Ed.), Human factors and voice interactive systems (pp. 205–235). Norwell: Kluwer Academic.

    Google Scholar 

  • Balentine, B., & Morgan, D. P. (1999). How to build a speech recognition application. San Ramon: Enterprise Integration Group, Inc.

    Google Scholar 

  • Blattner, M. M., Sumikawa, D. A., & Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human Computer Interaction, 4(1), 11–44.

    Article  Google Scholar 

  • Brewster, S. A. (1998). Using nonspeech sounds to provide navigation cues. ACM Transactions on Computer-Human Interaction (TOCHI), 5(3), 224–259.

    Article  MathSciNet  Google Scholar 

  • Gardner-Bonneau, D. (Ed.) (1999). Human factors and voice interactive systems. Norwell: Kluwer Academic.

    Google Scholar 

  • Carroll, J., & Carrithers, C. (1984). Blocking learner error states in a training-wheels system. Human Factors, 4(26), 377–389.

    Google Scholar 

  • Gong, L., & Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users’ performance, perception, and attitude. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 158–165). New York: ACM Press.

    Chapter  Google Scholar 

  • Karat, C. M., Halverson, M., Horn, D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 568–575). New York: ACM Press.

    Google Scholar 

  • Leplatre, G., & Brewster, S. A. (2000). Designing non-speech sounds to support navigation in mobile phone menus. In International conference on auditory display, 2000.

  • Linton, F., Joy, D., Schaefer, P., & Charron, A. (2000). Owl: A recommender system for organization-wide learning. Educational Technology & Society, 1(3), 62–76.

    Google Scholar 

  • McGrenere, J., & Moore, G. (2000). Are we all in the same “bloat”? In Graphics interface, pp. 187–186, 2000.

  • McGrenere, J., Baecker, R. M., & Booth, K. S. (2002). An evaluation of a multiple interface design solution for bloated software. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 164–170). New York: ACM Press.

    Google Scholar 

  • McInnes, F. R., Nairn, I. A., Attwater, D. J., Edgington, M. D., & Jack, M. A. (1999). A comparison of confirmation strategies for fluent telephone dialogues. In Human factors in telecommunication, 1999.

  • Pu, P., & Faltings, B. (2002). Personalized navigation of heterogeneous product spaces using smart client. In Proceedings of the 7th international conference on intelligent user interfaces (pp. 212–213). New York: ACM Press.

    Chapter  Google Scholar 

  • Resnick, P., & Virzi, R. A. (1992). Skip and scan: Cleaning up telephone interfaces. In Proceedings of ACM CHI’92 (pp. 419–426). New York: ACM Press.

    Google Scholar 

  • Roberts, T. L., & Engelbeck, G. (1989). The effects of device technology on the usability of advanced telephone functions. ACM SIGCHI Bulletin, 20(SI), 331–337.

    Article  Google Scholar 

  • Shajahan, P., & Irani, P. (2004). Representing hierarchies using multiple synthetic voices. In 8th international conference on information visualization (pp. 885–891). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Smyth, B., McCarthy, K., & Reilly, J. (2005). Mobile portal personalization: tools and techniques. In B. Mobasher & S. S. Anand (Eds.), Intelligent techniques in Web personalization (pp. 255–271). Berlin: Springer.

    Chapter  Google Scholar 

  • Suhm, B., Freeman, B., & Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. In Extended abstracts on human factors in computer systems (pp. 131–132). New York: ACM Press.

    Chapter  Google Scholar 

  • Sumikawa, D. A. (1985). Guidelines for the integration of audio cues into computer user interfaces. Livermore: Lawrence Livermore National Laboratory.

    Google Scholar 

  • Tatchell, G. R. (1996). Problems with the existing telephony customer interface: The pending eclipse of touch-tone and dial-tone. In Conference companion on human factors in computing systems (pp. 242–243). New York: ACM Press.

    Chapter  Google Scholar 

  • Voicegenie. www.voicegenie.com (2008).

  • Yankelovich, N., Levow, G., & Marx, M. (1995). Designing speechacts: issues in speech user interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 369–376). New York: ACM Press/Addison-Wesley.

    Google Scholar 

  • Yin, M., & Zhai, S. (2006). The benefits of augmenting telephone voice menu navigation with visual browsing and search. In Proceedings of ACM CHI’06, pp. 319–328, 2006.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pourang Irani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irani, P., Shajahan, P. & Kemke, C. VoiceMarks: restructuring hierarchical voice menus for improving navigation. Int J Speech Technol 9, 75–94 (2006). https://doi.org/10.1007/s10772-008-9007-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-008-9007-3

Keywords

Navigation