Skip to main content

Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems

  • Chapter
  • First Online:
Situated Dialog in Speech-Based Human-Computer Interaction

Abstract

Sentences extracted from Twitter have been seen as a valuable resource for response generation in dialogue systems. However, selecting appropriate ones is difficult due to their noise. This paper proposes tackling such noise by syntactic filtering and content-based retrieval. Syntactic filtering ascertains the valid sentence structure as system utterances, and content-based retrieval ascertains that the content has the relevant information related to user utterances. Experimental results show that our proposed method can appropriately select high-quality Twitter sentences, significantly outperforming the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bessho F, Harada T, Kuniyoshi Y (2012) Dialog system using real-time crowdsourcing and Twitter large-scale corpus. In: Proceedings of the SIGDIAL, pp 227–231

    Google Scholar 

  2. Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human-computer relationships. ACM Trans Comput-Hum Interact 12(2):293–327

    Article  Google Scholar 

  3. Colby KM, Watt JB, Gilbert JP (1966) A computer method of psychotherapy: preliminary communication. J Nerv Mental Dis 142(2):148–152

    Article  Google Scholar 

  4. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  5. Fuchi T, Takagi S (1998) Japanese morphological analyzer using word co-occurrence—JTAG. Proc COLING-ACL 1:409–413

    Google Scholar 

  6. Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. Proc COLING 2:539–545

    Google Scholar 

  7. Higashinaka R, Isozaki H (2008) Automatically acquiring causal expression patterns from relation-annotated corpora to improve question answering for why-questions. ACM Trans Asian Lang Inf Process 7(2)

    Google Scholar 

  8. Higuchi S, Rzepka R, Araki K (2008) A casual conversation system using modality and word associations retrieved from the web. In: Proceedings of the EMNLP, pp 382–390

    Google Scholar 

  9. Imamura K, Kikui G, Yasuda N (2007) Japanese dependency parsing using sequential labeling for semi-spoken language. In: Proceedings of the ACL, pp 225–228

    Google Scholar 

  10. Inaba M, Kamizono S, Takahashi K (2013) Utterance generation for non-task-oriented dialogue systems using Twitter. In: Proceedings of the 27th annual conference of the japanese society for artificial intelligence. 1K4-OS-17b-4 (in Japanese)

    Google Scholar 

  11. Kudo T, Matsumoto Y (2004) A boosting algorithm for classification of semi-structured text. In: Proceedings of the EMNLP, pp 301–308

    Google Scholar 

  12. Louis A, Newman T (2012) Summarization of business-related tweets: A concept-based approach. In: Proceedings of the COLING 2012 (Posters), pp 765–774

    Google Scholar 

  13. Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable Markov decision processes. In: Proceedings of the COLING, pp 761–769

    Google Scholar 

  14. Ritter A, Cherry C, Dolan WB (2011) Data-driven response generation in social media. In: Proceedings of the EMNLP, pp 583–593

    Google Scholar 

  15. Shibata M, Nishiguchi T, Tomiura Y (2009) Dialog system for open-ended conversation using web documents. Informatica (Slovenia) 33(3):277–284

    MATH  Google Scholar 

  16. Sugiyama H, Meguro T, Higashinaka R, Minami Y (2013) Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL, pp 334–338

    Google Scholar 

  17. Takeuchi S, Cincarek T, Kawanami H, Saruwatari H, Shikano K (2007) Construction and optimization of a question and answer database for a real-environment speech-oriented guidance system. In: Proceedings of the Oriental COCOSDA

    Google Scholar 

  18. Tokunaga K, Kazama J, Torisawa K (2005) Automatic discovery of attribute words from web documents. In: Proceedings of the IJCNLP, pp 106–118

    Google Scholar 

  19. Walker MA, Passonneau R, Boland JE (2001) Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In: Proceedings of the ACL, pp 515–522

    Google Scholar 

  20. Wallace RS (2004) The anatomy of A.L.I.C.E. A.L.I.C.E. artificial intelligence foundation, Inc

    Google Scholar 

  21. Weizenbaum J (1966) ELIZA-a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45

    Article  Google Scholar 

  22. Yoshino K, Mori S, Kawahara T (2011) Spoken dialogue system based on information extraction using similarity of predicate argument structures. In: Proceedings of the SIGDIAL, pp 59–66

    Google Scholar 

Download references

Acknowledgments

We thank Prof. Kohji Dohsaka of Akita Prefectural University for his helpful advice on statistical tests. We also thank Tomoko Izumi for her suggestions on how to write linguistic examples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryuichiro Higashinaka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Higashinaka, R. et al. (2016). Syntactic Filtering and Content-Based Retrieval of Twitter Sentences for the Generation of System Utterances in Dialogue Systems. In: Rudnicky, A., Raux, A., Lane, I., Misu, T. (eds) Situated Dialog in Speech-Based Human-Computer Interaction. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-21834-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21834-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21833-5

  • Online ISBN: 978-3-319-21834-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics