Skip to main content

Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

Abstract

The development of computerized information retrieval dialogue systems communicating with the user in natural language requires the implementation of an effective training procedure with the aid of which the main modules of the dialogue system can be partly automatically developed. The presented paper describes an attempt to create the sentence templates automatically, using a special program package implementing an especially developed method of a quantitative linguistic analysis of transcribed real dialogues. Firstly, the program package generates a set of formulas (templates) consisting of elements of a special grammar and describing the syntactic structure of required sentences. Secondly, it generates a large corpus of unique training sentences using the sentence templates and a stochastic context-free grammar. The experimentally created corpus was used for the training of modules of a city information dialogue system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hoffmannová, J.; Mullerová, O.: “Dialog v češtině”, Sagner Verlag, Munchen, 1999.

    Google Scholar 

  2. Mullerová, O.: “Výskyt a funkce slova “no” v českých textech prostě sdělovacího stylu”, Stylistika, Vol. 4, 1996, pp. 222–229.

    Google Scholar 

  3. Rieck, S.: “Parametrisierung und Klassifikation gesprochener Sprache”, PhD. Thesis, University of Erlangen, 1994.

    Google Scholar 

  4. Selting, M.: “Fragments of TCUs as deviant cases of TCU-production in conversational talk”, University of Konstanz, InLiSt No. 9, 1998.

    Google Scholar 

  5. Selting, M.: “TCUs and TRPs: The Construction of Units in Conversational Talk”, University of Konstanz, InLiSt No. 4, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schwarz, J., Matoušek, V. (2001). Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_56

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_56

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics