Abstract
This paper presents a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions even for people with poor recognition performance. The robust behaviour of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows the system to deal with partial or total breakdowns at other levels of analysis. We report the field trial data of the system with respect to speech recognition metrics of word accuracy and sentence understanding rate, time-to-completion, time-to-acquisition of crucial parameters, and degree of success of the interactions in providing the speakers with the information they required. The evaluation data show that most of the subjects were able to interact fruitfully with the system. These results suggest that the design choices made to achieve robust behaviour are a promising way to create usable spoken language telephone systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Allen, J.F., Miller, B.W., Ringger, E.K., and Sikorski, T. (1996). A robust system for natural spoken dialogue.Proceedings of the 14th Meeting of the ACL. Santa Cruz, CA.
Andry, F. (1992). Static and dynamic predictions: A method to improve speech understanding in cooperative dialogues.Proceedings of ICSPL. Banff, pp. 639–642.
Bourlard, H. and Morgan, N. (1993).Connectionist Speech Recognition: A Hybrid Approach. Norwell, MA: Kluwer Academic Publishers.
Chow, Y. and Schwartz, R. (1989). TheN-best algorithm: An efficient procedure for finding topn sentence hypotheses.Proceedings of the 2nd DARPA Workshop on Speech and Natural Language. San Mateo, CA, pp. 199–202.
Cravero, M., Fissore, L., Pieraccini, R., and Scagliola, C. (1984). Syntax-driven recognition of connected words by Markov Models.Proceedings of ICASSP-84, San Diego, CA, pp. 35.5.1–35.5.4.
Danieli, M. (1996). On the use of expectations for detecting and repairing human-machine miscommunications.Proceedings of AAAI-96 Conference Workshop on Detecting, Preventing, and Repairing Human-Machine Miscommunications. Portland, OR, pp. 87–93.
Danieli, M. and Gerbino, E. (1995). Metrics for evaluating dialogue strategies in a spoken language system.Working Notes of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation. Stanford, CA, pp. 34–39.
Eckert, W., Gallwitz, F., and Niemann, H. (1996). Combining stochastic and linguistic language models for recognition of spontaneous speech.Proceedings of ICASSP-96. Atlanta, GE, Vol. 1, pp. 423–427.
Fissore, L., Ravera, F., and Laface, P. (1995). Acoustic-phonetic modeling for flexible vocabulary speech recognition.Proceedings of EUROSPEECH 95. Madrid, Spain, Vol. 1, pp. 799–802.
Gemello, R., Albesano, D., Mana, F., and Cancelliere, R. (1994). Recurrent network automata for speech recognition: A summary of recent work.Proceedings of IEEE Neural Networks for Signal Processing Workshop. Ermioni, Greece.
Gerbino, E. and Danieli, M. (1993). Managing dialogue in a continuous speech understanding system.Proceedings of the Third European Conference on Speech Communication and Technology. Berlin, Germany, pp. 1661–1164.
Ney, H., Essen, U., and Kneser, R. (1994). On structuring probabilistic dependencies in stochastic language modeling.Computer Speech and Language, 8: 1–38.
Robinson, A.J. (1994). An application of recurrent nets to phone probability estimation.IEEE Transactions on Neural Networks, 5(2):298–305.
Smith, R. and Hipp, R.D. (1994).Spoken Natural Language Dialog Systems: A Practical Approach. Oxford and New York: Oxford University Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Albesano, D., Baggia, P., Danieli, M. et al. A robust system for human-machine dialogue in telephony-based applications. Int J Speech Technol 2, 101–111 (1997). https://doi.org/10.1007/BF02208822
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02208822