Abstract
The automation of Directory Assistance Services (DAS) through speech is one of the most difficult and demanding applications of human-computer interaction because it deals with very large vocabulary recognition issues. In this paper, we present a spoken dialogue system for automating DAS.1 Taking into account the major difficulties of this endeavor a stepwise approach was adopted. In particular, two prototypes D1.1 (basic approach) and D1.2 (improved version) were developed successively. The results of D1.1 evaluation were used to refine D1.1 and gradually led to D1.2 that was also improved using a feedback approach. Furthermore, the system was extended and optimized so that it can be utilized in real-world conditions. We describe the general architecture and the three stages of the system's development in detail. Evaluation results concerning both the speech recognizer's accuracy and the overall system's performance are provided for all prototypes. Finally, we focus on techniques that handle large vocabulary recognition issues. The use of Directed Acyclic Word Graphs (DAWGs) and context-dependent phonological rules resulted in search space reduction and therefore in faster response, and also in improved accuracy.
Similar content being viewed by others
References
Aoe, J., Morimoto, K., and Hase, M. (1993). An algorithm for compressing common suffixes used in trie structures. Systems and Computers in Japan, 24(12):31-42 (Translated from Trans. IEICE, J75-D-II(4):770-799, 1992).
Aust, H., Oerder, M., Seide, F., and Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication, 17:249-262.
Betz, M. and Hild, H. (1995). Language models for a spelled letter recognizer. Proceedings of ICASSP, Detroit, MI, vol. 1, pp. 856-859.
Collingham, R.J., Johnson, K., Nettleton, D.J., Dempster, G., and Garigliano, R. (1997). The Durham telephone enquiry system. International Journal of Speech Technology, 2(2):113-119.
Córdoba, R., San-Segundo, R., Montero, J.M., Colás, J., Ferreiros, J., Macías-Guarasa, J., and Pardo, J.M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. Proceedings of Eurospeech, Aalborg, Denmark, pp. 1279-1282.
Daciuk, J., Mihov, S., Watson, B., and Watson, R. (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics, 26(1):3-16.
Gao, Y., Ramabhadran, B., Chen, J., Erdõgan, H., and Picheny, M. (2001). Innovative approaches for large vocabulary name recognition. Proceedings of ICASSP, Salt Lake City, Utah.
Gardner-Bonneau, D. (1992). Human factors problems in interactive voice response (IVR) applications: Do we need a guideline/ standard? Proceedings of Human Factors Society, 36th Annual Meeting, vol. 1, pp. 222-226.
Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (1998). An integrated dialogue system for the automation of call centre services. Proceedings of ICSLP, Sidney, Australia, pp. 45-48.
Georgila, K., Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2000). Fast very large vocabulary recognition based on compact DAWGstructured language models. Proceedings of ICSLP, Beijing, China, vol. 2, pp. 987-990.
Georgila, K., Fakotakis, N., and Kokkinakis, G. (2001a). Efficient stochastic finite-state networks for language modelling in spoken dialogue systems. Proceedings of Eurospeech, Aalborg, Denmark, vol. 1, pp. 247-250.
Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (2001b). Improved large vocabulary speech recognition using lexical rules. Proceedings of PCHCI-Advances in Human-Computer Interaction, Patras, Greece, pp. 191-196.
Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., Seneff, S., and Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17:1-18.
Gong, L. and Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users' performance, perception and attitude. Proceedings of CHI, pp. 158-165.
Gorin, A., Riccardi, G., and Wright, J.H. (1997). How May I Help You? Speech Communication, 23:113-127.
Gupta, V., Robillard, S., and Pelletier, C. (1998). Automation of locality recognition in ADAS plus. Proceedings of IVTTA, Turin, Italy, pp. 1-4.
Hanazawa, K., Minami, Y., and Furui, S. (1997). An efficient search method for large-vocabulary continuous-speech recognition. Proceedings of ICASSP, Munich, Germany, pp. 1787-1790.
Hennecke, M.E., Kaspar, B., Tsopanoglou, A., Michos, S., Mantakas, M., and Safra, S. (1999). Design specification and planning of evaluation (IDAS Technical Report 2.2:D1.2).
Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., and Morgan, N. (1994). The Berkeley restaurant project. Proceedings of ICSLP, pp. 2139-2142.
Kamm, C.A., Shamieh, C.R., and Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17:303-311.
Kaspar, B. et al. (1997). SPRADIAK-Directory assistance pilot. Proceedings of VOICE.
Lamel, L., Rosset, S., Gauvain, J.L., Bennacef, S., Garnier-Rizet, M., and Prouts, B. (2000). The LIMSI ARISE system. Speech Communication, 31:339-353.
Lennig, M. (1990). Putting speech recognition to work in the telephone network. IEEE Computer, 23(8):35-41.
Lennig, M., Bielby, G., and Massicotte, J. (1995). Directory assistance automation in Bell Canada: Trial results. Speech Communication, 17:227-234.
Rahim, M., Di Fabbrizio, G., Kamm, C., Walker, M., Pokrovsky, A., Ruscitty, P., Levin, E., Lee, S., Syrdal, A., and Schlosser, K. (2001). Voice-IF: A mixed-initiative spoken dialogue system for AT&T conference services. Proceedings of Eurospeech, Aalborg, Denmark, vol. 2, pp. 1339-1342.
Ramabhadran, B., Bahl, L.R., de Souza, P.V., and Padmanabhan, M. (1998). Acoustics-only based automatic phonetic baseform generation. Proceedings of ICASSP, Seatlle,WA, vol. 1, pp. 309-312.
Schmid, P., Cole, R., and Fanty,M. (1993). Automatically generated word pronunciations from phoneme classifier output. Proceedings of ICASSP, Minneapolis, MN, vol. 2, pp. 223-226.
Seide, F. and Kellner, A. (1997). Towards an automated directory information system. Proceedings of Eurospeech, Rhodes, Greece, vol. 3, pp. 1327-1330.
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (1995). Two algorithms for incremental construction of directed acyclic word graphs. International Journal on Artificial Intelligence Tools, 4(3):369-381.
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2001). Incremental construction of compact acyclic NFAs. Proceedings of ACLEACL, Toulouse, France, pp. 482-489.
Sugamura, N., Hirokawa, T., Sagayama, S., and Furui, S. (1998). Speech processing technologies and telecommunications applications at NTT. Proceedings of IVTTA, Turin, Italy, pp. 37-42.
Van den Heuvel, H., Moreno, A., Omologo, M., Richard, G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127-143.
Whittaker, S.J. and Attwater, D.J. (1995). Advanced speech applications-The integration of speech technology into complex services. ESCA Workshop on Spoken Dialogue Systems-Theory and Application, Visgø, Denmark, pp. 113-116.
Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (1997). The HTK Book, user manual, Entropic Cambridge Research Laboratory, Cambridge.
Zue, V., Seneff, S., Glass, J., Hetherington, L., Hurley, E., Meng, H., Pao, C., Polifroni, J., Schloming, R., and Schmid, P. (1997). From interface to content: Translingual access and delivery of on-line information. Proceedings of Eurospeech, Rhodes, Greece, vol. 4, pp. 2227-2230.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Georgila, K., Sgarbas, K., Tsopanoglou, A. et al. A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services. International Journal of Speech Technology 6, 145–159 (2003). https://doi.org/10.1023/A:1022338631326
Issue Date:
DOI: https://doi.org/10.1023/A:1022338631326