Abstract
In real environments where automatic speech recognition (ASR) performance is not always high, an alternative strategy to relying on ASR results is required to achieve a robust speech interaction.We construct a spoken dialogue system that can enter into the enumeration subdialogue, in which the utterance timing as well as the ASR results are used to interpret user utterances. Since the utterance timing can be obtained more reliably than the ASR results, the subdialogue leads to achieve a more robust interaction even in situations with a low ASR performance. We conducted an experiment with 31 participants. The results showed that our system achieved a higher task completion rate than a baseline system that uses only the ASR results when the ASR accuracy was not high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Austin, How to Do Things with Words (Oxford University Press, 1962)
N. Ström, S. Seneff, in Proc. Int’l Conf. Spoken Language Processing (ICSLP) (2000), pp. 652–655
R.C. Rose, H.K. Kim, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2003), pp. 198–203
A. Raux, M. Eskenazi, in Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT NAACL) (2009), pp. 629–637
K. Matsuyama, K. Komatani, T. Ogata, H.G. Okuno, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2009), pp. 252–255
C. Kim, R.M. Stern, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2008), pp. 2598–2601
R. Takeda, K. Nakadai, K. Komatani, T. Ogata, H.G. Okuno, in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2008), pp. 1718–1723
A. Lee, T. Kawahara, in Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference (2009), pp. 131–137
Acknowledgements
We are grateful to Prof. Shinsuke Mori and Mr. Tetsuro Sasada of Kyoto University for constructing the statistical language model for the system. This work was supported in part by KAKENHI (grant-in-aid for scientific research from the Ministry of Education, Culture, Sports, Science and Technology of Japan) and the JST PRESTO program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this paper
Cite this paper
Komatani, K., Matsuyama, K., Takeda, R., Ogata, T., Okuno, H.G. (2011). Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_31
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1335-6_31
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)