Skip to main content
Log in

One family, many voices: Can multiple synthetic voices be used as navigational cues in hierarchical interfaces?

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Many commercial applications use synthetic speech for conveying information. In many cases the structure of the information is hierarchical (e.g. menus). In this article, we describe the results of two experiments that examine the possibility of conveying hierarchies (family of trees) using multiple synthetic voices. We postulate that if hierarchical structures can be conveyed using synthetic speech, then navigation in these hierarchies can be improved. In the first experiment, hierarchies containing 10 nodes, with a depth of 3 levels, were created. We used synthetic voices to represent nodes in these hierarchies. A within-subjects study (N = 12) was conducted to compare multiple synthetic voices against single synthetic voices for locating the positions of nodes in a hierarchy. Multiple synthetic voices were created by manipulating synthetic voice parameters according to a set of design principles. Results of the first experiment showed that the subjects performed the tasks significantly better with multiple synthetic voices than with single synthetic voices. To investigate the effect of multiple synthetic voices on complex hierarchies a second experiment was conducted. A hierarchy of 27 nodes was created and a between-subjects study (N = 16) was carried out. The results of this experiment showed that the participants recalled 84.38% of the nodes accurately. Results from these studies imply that multiple synthetic voices can be effectively used to represent and provide navigation cues in interfaces structured as hierarchies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Balentine, B. (1999). Re-Engineering the speech menu. In D. G. Bonneau (Ed.), Human Factors and Voice Interactive Systems. Massachusetts: Kluwer Academic Publishers, pp. 205–235.

    Google Scholar 

  • Blattner, M.M., Sumikawa, D.A., and Greenberg, R.M. (1989). Earcons and icons: Their structure and common design principles. Human Computer Interaction, 4(1):11–44.

    Article  Google Scholar 

  • Brave, S. and Nass, C. (2002). Emotion in human-computer interaction. In J. Jacko and A. Sears (Eds.), Handbook of Human-Computer Interaction. New York: Lawrence Erlbaum Associates, pp. 251–271.

  • Brewster, S.A. (1998). Using nonspeech sounds to provide navigation cues. ACM Transactions on Computer-Human Interaction (TOCHI), 5(3):224–259.

    Article  MathSciNet  Google Scholar 

  • Brewster, S.A. (2002). Chapter 12: Non-speech auditory output. In J. Jacko, and A. Sears (Eds.), The Human Computer Interaction Handbook. United States: Lawrence Erlbaum Associates, pp. 220–239.

  • Brewster, S.A., Wright, P.C., and Edwards, A.D.N. (1992). A detailed investigation into the effectiveness of earcons. In G. Kramer (Ed.), Auditory Display. Sonification, Audification and Auditory Interfaces. The Proceedings of the First International Conference on Auditory Display, Santa Fe Institute, Santa Fe, NM: Addison-Wesley, pp. 471–498.

  • Brewster, S.A, Wright, P.C., and Edwards, A.D.N. (1993). An evaluation of earcons for use in auditory human-computer interfaces. In INTERCHI Conference Proceedings. Amsterdam, Netherlands: ACM Press, pp. 222–227.

  • Cahn, J. (1989). Generating expression in synthesized speech. Master’s thesis, Massachusetts Institute of Technology.

  • Halstead-Nussloch, R. (1989). The design of phone-based interfaces for consumers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Texas, United States: ACM Press, pp. 347–352.

  • Furui, S. (1986). Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Communications, 5(2):183–197.

    Article  Google Scholar 

  • Greenspan, S.L., Nusbaum, H.C., and Pisoni, D.B. (1988). Perception of synthetic speech produced by rule: Intelligibility of eight text-to-speech systems. Behavior Research Methods, Instruments, & Computers, 18:100–107.

    Google Scholar 

  • Johnson, C.C., Hollien, H.F., and Hicks, J.W. (1984). Speaker identification utilizing selected temporal speech features. Journal of Phonetics, 12:319–326.

    Google Scholar 

  • Lai, J., Cheng, K., Green, P., and Tsimhoni, O. (2001). On the road and on the web? Comprehension of synthetic and human speech while driving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, pp. 206–212.

  • Lai, J., Wood, D., and Considine, M. (2000). The effect of task conditions on the comprehensibility of synthetic speech. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. The Hague, The Netherlands: ACM Press, pp. 321–338.

  • Larson, K. and Mowatt, D. (2003). A speech-based human-computer interaction system for automating directory assistance services. International Journal of Speech Technology, 6(62):145–159.

    Google Scholar 

  • Nass, C., Moon, Y., Fogg, B.J., Reeves, B., and Dryer, D.C. (1995). Can computer personalities be human personalities? International Journal of Human-Computer Studies, 43(2):223–239.

    Google Scholar 

  • Nass, C., Moon, Y., and Green, N. (1997). Are computers gender-neutral? Gender stereotypic responses to computers. Journal of Applied Social Psychology, 27(10):864–876.

    Article  Google Scholar 

  • Nusbaum, H.C. and Pisoni, D.B. (1985). Constraints on the perception of synthetic speech generated by rule. Behavior Research Methods, Instruments, & Computers, 17: 235–242.

    Google Scholar 

  • Resnick, P. and Virzi, R.A. (1992). Skip and scan: Cleaning up telephone interfaces. In Proceedings of ACM CHI’92. California, United States: ACM Press, pp. 419–426.

  • Rosson, M.B. (1985). Using synthetic speech for remote access to information. Behavioral Research Methods and Instrumentation, 17(2):250–252.

    Google Scholar 

  • Sambur, M.R. (1975). Selection of acoustic features for speaker identification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(2):176–182.

    Article  Google Scholar 

  • Shajahan, P. and Irani, P. (2003). Improving navigation in touch-tone interfaces. In Human Factors in Telecommunication (HFT). Berlin, Germany, pp. 145–152.

  • Slowiaczek, L.M. and Nusbaum, H.C. (1985). Effects of speech rate and pitch contour on the perception of synthetic speech. Human Factors, 27(6):701–712.

    Google Scholar 

  • Stylianou, Y., Cappe, O., and E. Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2):131–142.

  • Suhm, B., Freeman, B., and Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. Extended Abstracts on Human Factors in Computer Systems. Washington, United States: ACM Press, pp. 131–132.

  • Sumikawa, D.A. (1985). Guidelines for the integration of audio cues into computer user interfaces, Lawrence Livermore National Laboratory. Livermore, California, United States.

  • Vargas, M.L.M. and Anderson, S. (2003). Combining speech and earcons to assist menu navigation. In Proceedings of the 2003 International Conference on Auditory Display. Boston, United States: Boston University Publications.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peer Shajahan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shajahan, P., Irani, P. One family, many voices: Can multiple synthetic voices be used as navigational cues in hierarchical interfaces?. Int J Speech Technol 9, 1–15 (2007). https://doi.org/10.1007/s10772-006-9000-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-006-9000-7

Keywords

Navigation