Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems

Higashinaka, Ryuichiro; Kobayashi, Nozomi; Hirano, Toru; Miyazaki, Chiaki; Meguro, Toyomi; Makino, Toshiro; Matsuo, Yoshihiro

doi:10.1007/978-3-319-21834-2_2

Ryuichiro Higashinaka⁵,
Nozomi Kobayashi⁵,
Toru Hirano⁵,
Chiaki Miyazaki⁵,
Toyomi Meguro⁶,
Toshiro Makino⁵ &
…
Yoshihiro Matsuo⁵

Part of the book series: Signals and Communication Technology ((SCT))

788 Accesses
8 Citations

Abstract

Sentences extracted from Twitter have been seen as a valuable resource for response generation in dialogue systems. However, selecting appropriate ones is difficult due to their noise. This paper proposes tackling such noise by syntactic filtering and content-based retrieval. Syntactic filtering ascertains the valid sentence structure as system utterances, and content-based retrieval ascertains that the content has the relevant information related to user utterances. Experimental results show that our proposed method can appropriately select high-quality Twitter sentences, significantly outperforming the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bessho F, Harada T, Kuniyoshi Y (2012) Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In: Proceedings of the SIGDIAL, pp 227–231
Google Scholar
Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human-computer relationships. ACM Trans Comput-Hum Interact 12(2):293–327
Article Google Scholar
Colby KM, Watt JB, Gilbert JP (1966) A computer method of psychotherapy: preliminary communication. J Nerv Mental Dis 142(2):148–152
Article Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fuchi T, Takagi S (1998) Japanese morphological analyzer using word co-occurrence—JTAG. Proc COLING-ACL 1:409–413
Google Scholar
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. Proc COLING 2:539–545
Google Scholar
Higashinaka R, Isozaki H (2008) Automatically acquiring causal expression patterns from relation-annotated corpora to improve question answering for why-questions. ACM Trans Asian Lang Inf Process 7(2)
Google Scholar
Higuchi S, Rzepka R, Araki K (2008) A casual conversation system using modality and word associations retrieved from the web. In: Proceedings of the EMNLP, pp 382–390
Google Scholar
Imamura K, Kikui G, Yasuda N (2007) Japanese dependency parsing using sequential labeling for semi-spoken language. In: Proceedings of the ACL, pp 225–228
Google Scholar
Inaba M, Kamizono S, Takahashi K (2013) Utterance generation for non-task-oriented dialogue systems using Twitter. In: Proceedings of the 27th annual conference of the japanese society for artificial intelligence. 1K4-OS-17b-4 (in Japanese)
Google Scholar
Kudo T, Matsumoto Y (2004) A boosting algorithm for classification of semi-structured text. In: Proceedings of the EMNLP, pp 301–308
Google Scholar
Louis A, Newman T (2012) Summarization of business-related tweets: A concept-based approach. In: Proceedings of the COLING 2012 (Posters), pp 765–774
Google Scholar
Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable Markov decision processes. In: Proceedings of the COLING, pp 761–769
Google Scholar
Ritter A, Cherry C, Dolan WB (2011) Data-driven response generation in social media. In: Proceedings of the EMNLP, pp 583–593
Google Scholar
Shibata M, Nishiguchi T, Tomiura Y (2009) Dialog system for open-ended conversation using web documents. Informatica (Slovenia) 33(3):277–284
MATH Google Scholar
Sugiyama H, Meguro T, Higashinaka R, Minami Y (2013) Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL, pp 334–338
Google Scholar
Takeuchi S, Cincarek T, Kawanami H, Saruwatari H, Shikano K (2007) Construction and optimization of a question and answer database for a real-environment speech-oriented guidance system. In: Proceedings of the Oriental COCOSDA
Google Scholar
Tokunaga K, Kazama J, Torisawa K (2005) Automatic discovery of attribute words from web documents. In: Proceedings of the IJCNLP, pp 106–118
Google Scholar
Walker MA, Passonneau R, Boland JE (2001) Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In: Proceedings of the ACL, pp 515–522
Google Scholar
Wallace RS (2004) The anatomy of A.L.I.C.E. A.L.I.C.E. artificial intelligence foundation, Inc
Google Scholar
Weizenbaum J (1966) ELIZA-a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45
Article Google Scholar
Yoshino K, Mori S, Kawahara T (2011) Spoken dialogue system based on information extraction using similarity of predicate argument structures. In: Proceedings of the SIGDIAL, pp 59–66
Google Scholar

Download references

Acknowledgments

We thank Prof. Kohji Dohsaka of Akita Prefectural University for his helpful advice on statistical tests. We also thank Tomoko Izumi for her suggestions on how to write linguistic examples.

Author information

Authors and Affiliations

NTT Media Intelligence Laboratories, Kanagawa, Japan
Ryuichiro Higashinaka, Nozomi Kobayashi, Toru Hirano, Chiaki Miyazaki, Toshiro Makino & Yoshihiro Matsuo
NTT Communication Science Laboratories, Kyoto, Japan
Toyomi Meguro

Authors

Ryuichiro Higashinaka
View author publications
You can also search for this author in PubMed Google Scholar
Nozomi Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Toru Hirano
View author publications
You can also search for this author in PubMed Google Scholar
Chiaki Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar
Toyomi Meguro
View author publications
You can also search for this author in PubMed Google Scholar
Toshiro Makino
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiro Matsuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryuichiro Higashinaka .

Editor information

Editors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Alexander Rudnicky
Cupertino, California, USA
Antoine Raux
Silicon Valley, Carnegie Mellon University, Moffett Field, California, USA
Ian Lane
Mountain View, California, USA
Teruhisa Misu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Higashinaka, R. et al. (2016). Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems. In: Rudnicky, A., Raux, A., Lane, I., Misu, T. (eds) Situated Dialog in Speech-Based Human-Computer Interaction. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-21834-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-21834-2_2
Published: 21 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21833-5
Online ISBN: 978-3-319-21834-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics