Abstract
Sentences extracted from Twitter have been seen as a valuable resource for response generation in dialogue systems. However, selecting appropriate ones is difficult due to their noise. This paper proposes tackling such noise by syntactic filtering and content-based retrieval. Syntactic filtering ascertains the valid sentence structure as system utterances, and content-based retrieval ascertains that the content has the relevant information related to user utterances. Experimental results show that our proposed method can appropriately select high-quality Twitter sentences, significantly outperforming the baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bessho F, Harada T, Kuniyoshi Y (2012) Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In: Proceedings of the SIGDIAL, pp 227–231
Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human-computer relationships. ACM Trans Comput-Hum Interact 12(2):293–327
Colby KM, Watt JB, Gilbert JP (1966) A computer method of psychotherapy: preliminary communication. J Nerv Mental Dis 142(2):148–152
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fuchi T, Takagi S (1998) Japanese morphological analyzer using word co-occurrence—JTAG. Proc COLING-ACL 1:409–413
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. Proc COLING 2:539–545
Higashinaka R, Isozaki H (2008) Automatically acquiring causal expression patterns from relation-annotated corpora to improve question answering for why-questions. ACM Trans Asian Lang Inf Process 7(2)
Higuchi S, Rzepka R, Araki K (2008) A casual conversation system using modality and word associations retrieved from the web. In: Proceedings of the EMNLP, pp 382–390
Imamura K, Kikui G, Yasuda N (2007) Japanese dependency parsing using sequential labeling for semi-spoken language. In: Proceedings of the ACL, pp 225–228
Inaba M, Kamizono S, Takahashi K (2013) Utterance generation for non-task-oriented dialogue systems using Twitter. In: Proceedings of the 27th annual conference of the japanese society for artificial intelligence. 1K4-OS-17b-4 (in Japanese)
Kudo T, Matsumoto Y (2004) A boosting algorithm for classification of semi-structured text. In: Proceedings of the EMNLP, pp 301–308
Louis A, Newman T (2012) Summarization of business-related tweets: A concept-based approach. In: Proceedings of the COLING 2012 (Posters), pp 765–774
Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable Markov decision processes. In: Proceedings of the COLING, pp 761–769
Ritter A, Cherry C, Dolan WB (2011) Data-driven response generation in social media. In: Proceedings of the EMNLP, pp 583–593
Shibata M, Nishiguchi T, Tomiura Y (2009) Dialog system for open-ended conversation using web documents. Informatica (Slovenia) 33(3):277–284
Sugiyama H, Meguro T, Higashinaka R, Minami Y (2013) Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL, pp 334–338
Takeuchi S, Cincarek T, Kawanami H, Saruwatari H, Shikano K (2007) Construction and optimization of a question and answer database for a real-environment speech-oriented guidance system. In: Proceedings of the Oriental COCOSDA
Tokunaga K, Kazama J, Torisawa K (2005) Automatic discovery of attribute words from web documents. In: Proceedings of the IJCNLP, pp 106–118
Walker MA, Passonneau R, Boland JE (2001) Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In: Proceedings of the ACL, pp 515–522
Wallace RS (2004) The anatomy of A.L.I.C.E. A.L.I.C.E. artificial intelligence foundation, Inc
Weizenbaum J (1966) ELIZA-a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45
Yoshino K, Mori S, Kawahara T (2011) Spoken dialogue system based on information extraction using similarity of predicate argument structures. In: Proceedings of the SIGDIAL, pp 59–66
Acknowledgments
We thank Prof. Kohji Dohsaka of Akita Prefectural University for his helpful advice on statistical tests. We also thank Tomoko Izumi for her suggestions on how to write linguistic examples.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Higashinaka, R. et al. (2016). Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems. In: Rudnicky, A., Raux, A., Lane, I., Misu, T. (eds) Situated Dialog in Speech-Based Human-Computer Interaction. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-21834-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-21834-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21833-5
Online ISBN: 978-3-319-21834-2
eBook Packages: EngineeringEngineering (R0)