A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

Georgila, K.; Sgarbas, K.; Tsopanoglou, A.; Fakotakis, N.; Kokkinakis, G.

doi:10.1023/A:1022338631326

A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

Published: April 2003

Volume 6, pages 145–159, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. Georgila¹,
K. Sgarbas¹,
A. Tsopanoglou²,
N. Fakotakis¹ &
…
G. Kokkinakis¹

173 Accesses
3 Citations
Explore all metrics

Abstract

The automation of Directory Assistance Services (DAS) through speech is one of the most difficult and demanding applications of human-computer interaction because it deals with very large vocabulary recognition issues. In this paper, we present a spoken dialogue system for automating DAS.¹ Taking into account the major difficulties of this endeavor a stepwise approach was adopted. In particular, two prototypes D1.1 (basic approach) and D1.2 (improved version) were developed successively. The results of D1.1 evaluation were used to refine D1.1 and gradually led to D1.2 that was also improved using a feedback approach. Furthermore, the system was extended and optimized so that it can be utilized in real-world conditions. We describe the general architecture and the three stages of the system's development in detail. Evaluation results concerning both the speech recognizer's accuracy and the overall system's performance are provided for all prototypes. Finally, we focus on techniques that handle large vocabulary recognition issues. The use of Directed Acyclic Word Graphs (DAWGs) and context-dependent phonological rules resulted in search space reduction and therefore in faster response, and also in improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Speech Interface to the PENG $$^{ASP}$$ System

An Adaptive Speech Interface for Assistance in Maintenance and Changeover Procedures

A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis

References

Aoe, J., Morimoto, K., and Hase, M. (1993). An algorithm for compressing common suffixes used in trie structures. Systems and Computers in Japan, 24(12):31-42 (Translated from Trans. IEICE, J75-D-II(4):770-799, 1992).
Google Scholar
Aust, H., Oerder, M., Seide, F., and Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication, 17:249-262.
Google Scholar
Betz, M. and Hild, H. (1995). Language models for a spelled letter recognizer. Proceedings of ICASSP, Detroit, MI, vol. 1, pp. 856-859.
Google Scholar
Collingham, R.J., Johnson, K., Nettleton, D.J., Dempster, G., and Garigliano, R. (1997). The Durham telephone enquiry system. International Journal of Speech Technology, 2(2):113-119.
Google Scholar
Córdoba, R., San-Segundo, R., Montero, J.M., Colás, J., Ferreiros, J., Macías-Guarasa, J., and Pardo, J.M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. Proceedings of Eurospeech, Aalborg, Denmark, pp. 1279-1282.
Daciuk, J., Mihov, S., Watson, B., and Watson, R. (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics, 26(1):3-16.
Google Scholar
Gao, Y., Ramabhadran, B., Chen, J., Erdõgan, H., and Picheny, M. (2001). Innovative approaches for large vocabulary name recognition. Proceedings of ICASSP, Salt Lake City, Utah.
Gardner-Bonneau, D. (1992). Human factors problems in interactive voice response (IVR) applications: Do we need a guideline/ standard? Proceedings of Human Factors Society, 36th Annual Meeting, vol. 1, pp. 222-226.
Google Scholar
Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (1998). An integrated dialogue system for the automation of call centre services. Proceedings of ICSLP, Sidney, Australia, pp. 45-48.
Georgila, K., Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2000). Fast very large vocabulary recognition based on compact DAWGstructured language models. Proceedings of ICSLP, Beijing, China, vol. 2, pp. 987-990.
Google Scholar
Georgila, K., Fakotakis, N., and Kokkinakis, G. (2001a). Efficient stochastic finite-state networks for language modelling in spoken dialogue systems. Proceedings of Eurospeech, Aalborg, Denmark, vol. 1, pp. 247-250.
Google Scholar
Georgila, K., Tsopanoglou, A., Fakotakis, N., and Kokkinakis, G. (2001b). Improved large vocabulary speech recognition using lexical rules. Proceedings of PCHCI-Advances in Human-Computer Interaction, Patras, Greece, pp. 191-196.
Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., Seneff, S., and Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17:1-18.
Google Scholar
Gong, L. and Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users' performance, perception and attitude. Proceedings of CHI, pp. 158-165.
Gorin, A., Riccardi, G., and Wright, J.H. (1997). How May I Help You? Speech Communication, 23:113-127.
Google Scholar
Gupta, V., Robillard, S., and Pelletier, C. (1998). Automation of locality recognition in ADAS plus. Proceedings of IVTTA, Turin, Italy, pp. 1-4.
Hanazawa, K., Minami, Y., and Furui, S. (1997). An efficient search method for large-vocabulary continuous-speech recognition. Proceedings of ICASSP, Munich, Germany, pp. 1787-1790.
Hennecke, M.E., Kaspar, B., Tsopanoglou, A., Michos, S., Mantakas, M., and Safra, S. (1999). Design specification and planning of evaluation (IDAS Technical Report 2.2:D1.2).
Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., and Morgan, N. (1994). The Berkeley restaurant project. Proceedings of ICSLP, pp. 2139-2142.
Kamm, C.A., Shamieh, C.R., and Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17:303-311.
Google Scholar
Kaspar, B. et al. (1997). SPRADIAK-Directory assistance pilot. Proceedings of VOICE.
Lamel, L., Rosset, S., Gauvain, J.L., Bennacef, S., Garnier-Rizet, M., and Prouts, B. (2000). The LIMSI ARISE system. Speech Communication, 31:339-353.
Google Scholar
Lennig, M. (1990). Putting speech recognition to work in the telephone network. IEEE Computer, 23(8):35-41.
Google Scholar
Lennig, M., Bielby, G., and Massicotte, J. (1995). Directory assistance automation in Bell Canada: Trial results. Speech Communication, 17:227-234.
Google Scholar
Rahim, M., Di Fabbrizio, G., Kamm, C., Walker, M., Pokrovsky, A., Ruscitty, P., Levin, E., Lee, S., Syrdal, A., and Schlosser, K. (2001). Voice-IF: A mixed-initiative spoken dialogue system for AT&T conference services. Proceedings of Eurospeech, Aalborg, Denmark, vol. 2, pp. 1339-1342.
Google Scholar
Ramabhadran, B., Bahl, L.R., de Souza, P.V., and Padmanabhan, M. (1998). Acoustics-only based automatic phonetic baseform generation. Proceedings of ICASSP, Seatlle,WA, vol. 1, pp. 309-312.
Google Scholar
Schmid, P., Cole, R., and Fanty,M. (1993). Automatically generated word pronunciations from phoneme classifier output. Proceedings of ICASSP, Minneapolis, MN, vol. 2, pp. 223-226.
Google Scholar
Seide, F. and Kellner, A. (1997). Towards an automated directory information system. Proceedings of Eurospeech, Rhodes, Greece, vol. 3, pp. 1327-1330.
Google Scholar
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (1995). Two algorithms for incremental construction of directed acyclic word graphs. International Journal on Artificial Intelligence Tools, 4(3):369-381.
Google Scholar
Sgarbas, K., Fakotakis, N., and Kokkinakis, G. (2001). Incremental construction of compact acyclic NFAs. Proceedings of ACLEACL, Toulouse, France, pp. 482-489.
Sugamura, N., Hirokawa, T., Sagayama, S., and Furui, S. (1998). Speech processing technologies and telecommunications applications at NTT. Proceedings of IVTTA, Turin, Italy, pp. 37-42.
Van den Heuvel, H., Moreno, A., Omologo, M., Richard, G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127-143.
Google Scholar
Whittaker, S.J. and Attwater, D.J. (1995). Advanced speech applications-The integration of speech technology into complex services. ESCA Workshop on Spoken Dialogue Systems-Theory and Application, Visgø, Denmark, pp. 113-116.
Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (1997). The HTK Book, user manual, Entropic Cambridge Research Laboratory, Cambridge.
Zue, V., Seneff, S., Glass, J., Hetherington, L., Hurley, E., Meng, H., Pao, C., Polifroni, J., Schloming, R., and Schmid, P. (1997). From interface to content: Translingual access and delivery of on-line information. Proceedings of Eurospeech, Rhodes, Greece, vol. 4, pp. 2227-2230.
Google Scholar

Download references

Author information

Authors and Affiliations

Wire Communications Laboratory, Electrical and Computer Engineering Department, University of Patras, Greece
K. Georgila, K. Sgarbas, N. Fakotakis & G. Kokkinakis
LogicDIS Group, Knowledge S.A., Patras, Greece
A. Tsopanoglou

Authors

K. Georgila
View author publications
You can also search for this author in PubMed Google Scholar
K. Sgarbas
View author publications
You can also search for this author in PubMed Google Scholar
A. Tsopanoglou
View author publications
You can also search for this author in PubMed Google Scholar
N. Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar
G. Kokkinakis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Georgila, K., Sgarbas, K., Tsopanoglou, A. et al. A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services. International Journal of Speech Technology 6, 145–159 (2003). https://doi.org/10.1023/A:1022338631326

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1023/A:1022338631326

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

Abstract

Access this article

Similar content being viewed by others

A Speech Interface to the PENG $$^{ASP}$$ System

An Adaptive Speech Interface for Assistance in Maintenance and Changeover Procedures

A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

Abstract

Access this article

Similar content being viewed by others

A Speech Interface to the PENG $$^{ASP}$$ System

An Adaptive Speech Interface for Assistance in Maintenance and Changeover Procedures

A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation