Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis

Schwarz, Jana; Matoušek, Václav

doi:10.1007/3-540-44805-5_56

Jana Schwarz² &
Václav Matoušek³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

402 Accesses
1 Citations

Abstract

The development of computerized information retrieval dialogue systems communicating with the user in natural language requires the implementation of an effective training procedure with the aid of which the main modules of the dialogue system can be partly automatically developed. The presented paper describes an attempt to create the sentence templates automatically, using a special program package implementing an especially developed method of a quantitative linguistic analysis of transcribed real dialogues. Firstly, the program package generates a set of formulas (templates) consisting of elements of a special grammar and describing the syntactic structure of required sentences. Secondly, it generates a large corpus of unique training sentences using the sentence templates and a stochastic context-free grammar. The experimentally created corpus was used for the training of modules of a city information dialogue system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hoffmannová, J.; Mullerová, O.: “Dialog v češtině”, Sagner Verlag, Munchen, 1999.
Google Scholar
Mullerová, O.: “Výskyt a funkce slova “no” v českých textech prostě sdělovacího stylu”, Stylistika, Vol. 4, 1996, pp. 222–229.
Google Scholar
Rieck, S.: “Parametrisierung und Klassifikation gesprochener Sprache”, PhD. Thesis, University of Erlangen, 1994.
Google Scholar
Selting, M.: “Fragments of TCUs as deviant cases of TCU-production in conversational talk”, University of Konstanz, InLiSt No. 9, 1998.
Google Scholar
Selting, M.: “TCUs and TRPs: The Construction of Units in Conversational Talk”, University of Konstanz, InLiSt No. 4, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Slavistics, Technical University of Dresden, Germany
Jana Schwarz
Dept. of Computer Science, University of West Bohemia in Plzeň, Czech Republic
Václav Matoušek

Authors

Jana Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Václav Matoušek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, University of West Bohemia in Plzeň, Faculty of Applied Sciences, Univerzitní 22, 306-14, Plzeň, Czech Republic
Václav Matoušek , Pavel Mautner , Roman Mouček & Karel Taušer , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schwarz, J., Matoušek, V. (2001). Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_56

Download citation

DOI: https://doi.org/10.1007/3-540-44805-5_56
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics