Skip to main content
Log in

A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The automation of Directory Assistance Services (DAS) through speech is one of the most difficult and demanding applications of human-computer interaction because it deals with very large vocabulary recognition issues. In this paper, we present a spoken dialogue system for automating DAS.1 Taking into account the major difficulties of this endeavor a stepwise approach was adopted. In particular, two prototypes D1.1 (basic approach) and D1.2 (improved version) were developed successively. The results of D1.1 evaluation were used to refine D1.1 and gradually led to D1.2 that was also improved using a feedback approach. Furthermore, the system was extended and optimized so that it can be utilized in real-world conditions. We describe the general architecture and the three stages of the system's development in detail. Evaluation results concerning both the speech recognizer's accuracy and the overall system's performance are provided for all prototypes. Finally, we focus on techniques that handle large vocabulary recognition issues. The use of Directed Acyclic Word Graphs (DAWGs) and context-dependent phonological rules resulted in search space reduction and therefore in faster response, and also in improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aoe, J., Morimoto, K., and Hase, M. (1993). An algorithm for compressing common suffixes used in trie structures. Systems and Computers in Japan, 24(12):31-42 (Translated from Trans. IEICE, J75-D-II(4):770-799, 1992).

    Google Scholar 

  • Aust, H., Oerder, M., Seide, F., and Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication, 17:249-262.

    Google Scholar 

  • Betz, M. and Hild, H. (1995). Language models for a spelled letter recognizer. Proceedings of ICASSP, Detroit, MI, vol. 1, pp. 856-859.

    Google Scholar 

  • Collingham, R.J., Johnson, K., Nettleton, D.J., Dempster, G., and Garigliano, R. (1997). The Durham telephone enquiry system. International Journal of Speech Technology, 2(2):113-119.

    Google Scholar 

  • Córdoba, R., San-Segundo, R., Montero, J.M., Colás, J., Ferreiros, J., Macías-Guarasa, J., and Pardo, J.M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. Proceedings of Eurospeech, Aalborg, Denmark, pp. 1279-1282.

  • Daciuk, J., Mihov, S., Watson, B., and Watson, R. (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics, 26(1):3-16.

    Google Scholar 

  • Gao, Y., Ramabhadran, B., Chen, J., Erdõgan, H., and Picheny, M. (2001). Innovative approaches for large vocabulary name recognition. Proceedings of ICASSP, Salt Lake City, Utah.

  • Gardner-Bonneau, D. (1992). Human factors problems in interactive voice response (IVR) applications: Do we need a guideline/ standard? Proceedings of Human Factors Society, 36th Annual Meeting, vol. 1, pp. 222-226.

    Google Scholar 

  • Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (1998). An integrated dialogue system for the automation of call centre services. Proceedings of ICSLP, Sidney, Australia, pp. 45-48.

  • Georgila, K., Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2000). Fast very large vocabulary recognition based on compact DAWGstructured language models. Proceedings of ICSLP, Beijing, China, vol. 2, pp. 987-990.

    Google Scholar 

  • Georgila, K., Fakotakis, N., and Kokkinakis, G. (2001a). Efficient stochastic finite-state networks for language modelling in spoken dialogue systems. Proceedings of Eurospeech, Aalborg, Denmark, vol. 1, pp. 247-250.

    Google Scholar 

  • Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (2001b). Improved large vocabulary speech recognition using lexical rules. Proceedings of PCHCI-Advances in Human-Computer Interaction, Patras, Greece, pp. 191-196.

  • Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., Seneff, S., and Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17:1-18.

    Google Scholar 

  • Gong, L. and Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users' performance, perception and attitude. Proceedings of CHI, pp. 158-165.

  • Gorin, A., Riccardi, G., and Wright, J.H. (1997). How May I Help You? Speech Communication, 23:113-127.

    Google Scholar 

  • Gupta, V., Robillard, S., and Pelletier, C. (1998). Automation of locality recognition in ADAS plus. Proceedings of IVTTA, Turin, Italy, pp. 1-4.

  • Hanazawa, K., Minami, Y., and Furui, S. (1997). An efficient search method for large-vocabulary continuous-speech recognition. Proceedings of ICASSP, Munich, Germany, pp. 1787-1790.

  • Hennecke, M.E., Kaspar, B., Tsopanoglou, A., Michos, S., Mantakas, M., and Safra, S. (1999). Design specification and planning of evaluation (IDAS Technical Report 2.2:D1.2).

  • Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., and Morgan, N. (1994). The Berkeley restaurant project. Proceedings of ICSLP, pp. 2139-2142.

  • Kamm, C.A., Shamieh, C.R., and Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17:303-311.

    Google Scholar 

  • Kaspar, B. et al. (1997). SPRADIAK-Directory assistance pilot. Proceedings of VOICE.

  • Lamel, L., Rosset, S., Gauvain, J.L., Bennacef, S., Garnier-Rizet, M., and Prouts, B. (2000). The LIMSI ARISE system. Speech Communication, 31:339-353.

    Google Scholar 

  • Lennig, M. (1990). Putting speech recognition to work in the telephone network. IEEE Computer, 23(8):35-41.

    Google Scholar 

  • Lennig, M., Bielby, G., and Massicotte, J. (1995). Directory assistance automation in Bell Canada: Trial results. Speech Communication, 17:227-234.

    Google Scholar 

  • Rahim, M., Di Fabbrizio, G., Kamm, C., Walker, M., Pokrovsky, A., Ruscitty, P., Levin, E., Lee, S., Syrdal, A., and Schlosser, K. (2001). Voice-IF: A mixed-initiative spoken dialogue system for AT&T conference services. Proceedings of Eurospeech, Aalborg, Denmark, vol. 2, pp. 1339-1342.

    Google Scholar 

  • Ramabhadran, B., Bahl, L.R., de Souza, P.V., and Padmanabhan, M. (1998). Acoustics-only based automatic phonetic baseform generation. Proceedings of ICASSP, Seatlle,WA, vol. 1, pp. 309-312.

    Google Scholar 

  • Schmid, P., Cole, R., and Fanty,M. (1993). Automatically generated word pronunciations from phoneme classifier output. Proceedings of ICASSP, Minneapolis, MN, vol. 2, pp. 223-226.

    Google Scholar 

  • Seide, F. and Kellner, A. (1997). Towards an automated directory information system. Proceedings of Eurospeech, Rhodes, Greece, vol. 3, pp. 1327-1330.

    Google Scholar 

  • Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (1995). Two algorithms for incremental construction of directed acyclic word graphs. International Journal on Artificial Intelligence Tools, 4(3):369-381.

    Google Scholar 

  • Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2001). Incremental construction of compact acyclic NFAs. Proceedings of ACLEACL, Toulouse, France, pp. 482-489.

  • Sugamura, N., Hirokawa, T., Sagayama, S., and Furui, S. (1998). Speech processing technologies and telecommunications applications at NTT. Proceedings of IVTTA, Turin, Italy, pp. 37-42.

  • Van den Heuvel, H., Moreno, A., Omologo, M., Richard, G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127-143.

    Google Scholar 

  • Whittaker, S.J. and Attwater, D.J. (1995). Advanced speech applications-The integration of speech technology into complex services. ESCA Workshop on Spoken Dialogue Systems-Theory and Application, Visgø, Denmark, pp. 113-116.

  • Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (1997). The HTK Book, user manual, Entropic Cambridge Research Laboratory, Cambridge.

  • Zue, V., Seneff, S., Glass, J., Hetherington, L., Hurley, E., Meng, H., Pao, C., Polifroni, J., Schloming, R., and Schmid, P. (1997). From interface to content: Translingual access and delivery of on-line information. Proceedings of Eurospeech, Rhodes, Greece, vol. 4, pp. 2227-2230.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Georgila, K., Sgarbas, K., Tsopanoglou, A. et al. A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services. International Journal of Speech Technology 6, 145–159 (2003). https://doi.org/10.1023/A:1022338631326

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022338631326

Navigation