Abstract
Interactive Voice Response (IVR) systems, or touch-tone telephony interfaces, are nowadays a common medium of interaction between organizations or companies and their customers, allowing users to access or enter specific company-based information. These telephony interfaces typically involve the use of hierarchically structured voice menus, through which a user has to navigate in order to locate a specific desired menu item. This navigation process is often inefficient and time-consuming, leaving users at times frustrated and annoyed. In this paper, we describe the foundation of VoiceMarks, a system designed to improve the ease and efficiency of navigation in menu-based voice interfaces. The system features personalized menus through the use of voicemarks, in a process similar to bookmarking, but adapted to voice interfaces. VoiceMarks are essentially bookmarked nodes in the voice menu hierarchy, which are stored for the respective user in a directly accessible, personal menu. We developed and tested VoiceMarks interfaces for two applications: a bus schedule information system and a cinema ticket purchase system. A comparative study of VoiceMarks and traditional interfaces of these applications showed that VoiceMarks can significantly improve the interaction between users and systems, in terms of time and number of keystrokes needed to locate a menu item, as well as regarding user satisfaction. In general, users responded very positively to the VoiceMarks interface. In addition, the study pointed to some useful modifications of VoiceMarks, which should be considered before employing the system in a commercial setting.
Similar content being viewed by others
References
Baecker, R., Booth, K. S., Jovivic, S., McGrenere, J., & Moore, G. (2000). Reducing the gap between what users know and what they need to know. In Proceedings of the conference on universal usability (pp. 17–23). New York: ACM Press.
Balentine, B. (1999). Re-Engineering the speech menu. In D. Gardner-Bonneau (Ed.), Human factors and voice interactive systems (pp. 205–235). Norwell: Kluwer Academic.
Balentine, B., & Morgan, D. P. (1999). How to build a speech recognition application. San Ramon: Enterprise Integration Group, Inc.
Blattner, M. M., Sumikawa, D. A., & Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human Computer Interaction, 4(1), 11–44.
Brewster, S. A. (1998). Using nonspeech sounds to provide navigation cues. ACM Transactions on Computer-Human Interaction (TOCHI), 5(3), 224–259.
Gardner-Bonneau, D. (Ed.) (1999). Human factors and voice interactive systems. Norwell: Kluwer Academic.
Carroll, J., & Carrithers, C. (1984). Blocking learner error states in a training-wheels system. Human Factors, 4(26), 377–389.
Gong, L., & Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users’ performance, perception, and attitude. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 158–165). New York: ACM Press.
Karat, C. M., Halverson, M., Horn, D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 568–575). New York: ACM Press.
Leplatre, G., & Brewster, S. A. (2000). Designing non-speech sounds to support navigation in mobile phone menus. In International conference on auditory display, 2000.
Linton, F., Joy, D., Schaefer, P., & Charron, A. (2000). Owl: A recommender system for organization-wide learning. Educational Technology & Society, 1(3), 62–76.
McGrenere, J., & Moore, G. (2000). Are we all in the same “bloat”? In Graphics interface, pp. 187–186, 2000.
McGrenere, J., Baecker, R. M., & Booth, K. S. (2002). An evaluation of a multiple interface design solution for bloated software. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 164–170). New York: ACM Press.
McInnes, F. R., Nairn, I. A., Attwater, D. J., Edgington, M. D., & Jack, M. A. (1999). A comparison of confirmation strategies for fluent telephone dialogues. In Human factors in telecommunication, 1999.
Pu, P., & Faltings, B. (2002). Personalized navigation of heterogeneous product spaces using smart client. In Proceedings of the 7th international conference on intelligent user interfaces (pp. 212–213). New York: ACM Press.
Resnick, P., & Virzi, R. A. (1992). Skip and scan: Cleaning up telephone interfaces. In Proceedings of ACM CHI’92 (pp. 419–426). New York: ACM Press.
Roberts, T. L., & Engelbeck, G. (1989). The effects of device technology on the usability of advanced telephone functions. ACM SIGCHI Bulletin, 20(SI), 331–337.
Shajahan, P., & Irani, P. (2004). Representing hierarchies using multiple synthetic voices. In 8th international conference on information visualization (pp. 885–891). Los Alamitos: IEEE Computer Society.
Smyth, B., McCarthy, K., & Reilly, J. (2005). Mobile portal personalization: tools and techniques. In B. Mobasher & S. S. Anand (Eds.), Intelligent techniques in Web personalization (pp. 255–271). Berlin: Springer.
Suhm, B., Freeman, B., & Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. In Extended abstracts on human factors in computer systems (pp. 131–132). New York: ACM Press.
Sumikawa, D. A. (1985). Guidelines for the integration of audio cues into computer user interfaces. Livermore: Lawrence Livermore National Laboratory.
Tatchell, G. R. (1996). Problems with the existing telephony customer interface: The pending eclipse of touch-tone and dial-tone. In Conference companion on human factors in computing systems (pp. 242–243). New York: ACM Press.
Voicegenie. www.voicegenie.com (2008).
Yankelovich, N., Levow, G., & Marx, M. (1995). Designing speechacts: issues in speech user interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 369–376). New York: ACM Press/Addison-Wesley.
Yin, M., & Zhai, S. (2006). The benefits of augmenting telephone voice menu navigation with visual browsing and search. In Proceedings of ACM CHI’06, pp. 319–328, 2006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Irani, P., Shajahan, P. & Kemke, C. VoiceMarks: restructuring hierarchical voice menus for improving navigation. Int J Speech Technol 9, 75–94 (2006). https://doi.org/10.1007/s10772-008-9007-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-008-9007-3