Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances

Komatani, Kazunori; Matsuyama, Kyoko; Takeda, Ryu; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.1007/978-1-4614-1335-6_31

Kazunori Komatani³,
Kyoko Matsuyama⁴,
Ryu Takeda⁴,
Tetsuya Ogata⁴ &
…
Hiroshi G. Okuno⁴

Abstract

In real environments where automatic speech recognition (ASR) performance is not always high, an alternative strategy to relying on ASR results is required to achieve a robust speech interaction.We construct a spoken dialogue system that can enter into the enumeration subdialogue, in which the utterance timing as well as the ASR results are used to interpret user utterances. Since the utterance timing can be obtained more reliably than the ASR results, the subdialogue leads to achieve a more robust interaction even in situations with a low ASR performance. We conducted an experiment with 31 participants. The results showed that our system achieved a higher task completion rate than a baseline system that uses only the ASR results when the ASR accuracy was not high.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recognising Conversational Speech: What an Incremental ASR Should Do for a Dialogue System and How to Get There

Speech Dialog as a Part of Interactive “Human-Machine” Systems

Overview of the NLPCC 2023 Shared Task 10: Learn to Watch TV: Multimodal Dialogue Understanding and Response Generation

References

J. Austin, How to Do Things with Words (Oxford University Press, 1962)
Google Scholar
N. Ström, S. Seneff, in Proc. Int’l Conf. Spoken Language Processing (ICSLP) (2000), pp. 652–655
Google Scholar
R.C. Rose, H.K. Kim, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2003), pp. 198–203
Google Scholar
A. Raux, M. Eskenazi, in Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT NAACL) (2009), pp. 629–637
Google Scholar
K. Matsuyama, K. Komatani, T. Ogata, H.G. Okuno, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2009), pp. 252–255
Google Scholar
C. Kim, R.M. Stern, in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH) (2008), pp. 2598–2601
Google Scholar
R. Takeda, K. Nakadai, K. Komatani, T. Ogata, H.G. Okuno, in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2008), pp. 1718–1723
Google Scholar
A. Lee, T. Kawahara, in Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference (2009), pp. 131–137
Google Scholar

Download references

Acknowledgements

We are grateful to Prof. Shinsuke Mori and Mr. Tetsuro Sasada of Kyoto University for constructing the statistical language model for the system. This work was supported in part by KAKENHI (grant-in-aid for scientific research from the Ministry of Education, Culture, Sports, Science and Technology of Japan) and the JST PRESTO program.

Author information

Authors and Affiliations

Graduate School of Engineering, Nagoya University, Nagoya, Japan
Kazunori Komatani
Graduate School of Informatics, Kyoto University, Kyoto, Japan
Kyoko Matsuyama, Ryu Takeda, Tetsuya Ogata & Hiroshi G. Okuno

Authors

Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Kyoko Matsuyama
View author publications
You can also search for this author in PubMed Google Scholar
Ryu Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazunori Komatani .

Editor information

Editors and Affiliations

, Dept. of Languages and Computer Systems, University of Granada, Granada, 18071, Spain
Ramón López-Cózar Delgado
, Dept. of Computer Science & Engineering, Waseda University, Okubo 3-4-1, Tokyo, 169-8555, Japan
Tetsunori Kobayashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komatani, K., Matsuyama, K., Takeda, R., Ogata, T., Okuno, H.G. (2011). Evaluation of Spoken Dialogue System that uses Utterance Timing to Interpret User Utterances. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_31

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1335-6_31
Published: 12 August 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics