Abstract
The paper describes PARS - a manually annotated corpus of spoken Russian, which was built intentionally for training parsing algorithms and extracting grammars from Russian spontaneous speech. PARS corpus includes multiple annotation levels starting from signal-level boundaries of word forms and discourse units ending with syntactic structure representations following Universal Dependencies standard. Presented results include detailed description of corpus structure, principles of annotation and annotation levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
The models are freely available at MANASLU8 repository https://github.com/MANASLU8.
- 5.
- 6.
References
Kibrik, A.A.: Stories about dreams. Corpus based research of spoken Russian discourse, Rasskazi o snovideniyah. Korpusnoe issledovanie ustnogo russkogo diskursa (2009). (in Russian)
Blacfkmer, E.R., Mitton, J.L.: Theories of monitoring and the timing of repairs in spontaneous speech. Cognition 39(3), 173–194 (1991)
Carlson, L., Marcu, D.: Discourse Tagging Reference Manual. ISI Technical report ISI-TR-545 54, 56 (2001)
Dobrovoljc, K., Nivre, J.: The universal dependencies treebank of spoken Slovenian. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 1566–1573 (2016)
Givón, T.: Topic Continuity in Discourse: A Quantitative Cross-language Study, vol. 3. John Benjamins Publishing (1983)
Grimes, J.E., Grimes, J.E.: The Thread of Discourse, vol. 207. Walter de Gruyter (1975)
Heeman, P.A., Allen, J.F.: Speech repairs, intonational phrases, and discourse markers: modeling speakers’ utterances in spoken dialogue. Comput. Linguist. 25(4), 527–571 (1999)
Hirschberg, J., Litman, D.: Empirical studies on the disambiguation of cue phrases. Comput. linguist. 19(3), 501–530 (1993)
Johnson, W.: Measurements of oral reading and speaking rate and disfluency of adult male and female stutterers and nonstutterers. J. Speech Hear. Disord. Monogr. Suppl. (1961)
Kachkovskaia, T., Kocharov, D., Skrelin, P.A., Volskaya, N.B.: CoRuSS - a new prosodically annotated corpus of russian spontaneous speech. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 1949–1954 (2016)
Kovriguina, L., Shilin, I., Shipilo, A., Putintseva, A.: Russian tagging and dependency parsing models for stanford CoreNLP natural language toolkit. In: Różewski, P., Lange, C. (eds.) KESW 2017. CCIS, vol. 786, pp. 101–111. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69548-8_8
Levelt, W.J.: Monitoring and self-repair in speech. Cognition 14(1), 41–104 (1983)
Longacre, R.E.: The Grammar of Discourse. Springer, New York (1983). https://doi.org/10.1007/978-1-4615-8018-8
Maekawa, K., Koiso, H., Furui, S., Isahara, H.: Spontaneous speech corpus of Japanese. In: LREC (2000)
de Marneffe, M.C., et al.: More constructions, more genres: extending stanford dependencies. In: DepLing, pp. 187–196 (2013)
de Marneffe, M.C., et al.: Universal Dependencies: A cross-linguistic typology, pp. 4585–4592 (2014)
Miller, T.: Improved syntactic models for parsing speech with repairs. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31–June 5, 2009, Boulder, Colorado, USA, pp. 656–664 (2009). http://www.aclweb.org/anthology/N09-1074
Miller, T.A., Schuler, W.: A unified syntactic model for parsing fluent and disfluent speech. In: ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15–20 2008, Columbus, Ohio, USA, Short Papers. pp. 105–108 (2008). http://www.aclweb.org/anthology/P08-2027
Nesterenko, I., Rauzy, S., Bertrand, R.: Prosody in a corpus of french spontaneous speech: perception, annotation and prosody-syntax interaction. In: Speech Prosody 2010-Fifth International Conference (2010)
Polanyi, L.: A formal model of the structure of discourse. J. Pragmatics 12(5–6), 601–638 (1988)
Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn taking for conversation. In: Studies in the Organization of Conversational Interaction, pp. 7–55. Elsevier (1978)
Sherstinova, T.: The structure of the ORD speech corpus of russian everyday communication. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS (LNAI), vol. 5729, pp. 258–265. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04208-9_37
Shitaoka, K., Uchimoto, K., Kawahara, T., Isahara, H.: Dependency structure analysis and sentence boundary detection in spontaneous Japanese. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 1107. Association for Computational Linguistics (2004)
Shriberg, E.E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California, Berkeley (1994)
Stepanova, S., Asinovskij, A., Bogdanova, N., Rusakova, M., Sherstinova, T.: Speech corpus of the Russian everyday communication “One Speaker’s Day”: basic conception and current [Zvukovoj korpus russkogo jazyka povsednevnogo obwenija “Odin rechevoj den’": Koncepcija i sostojanie formirovanija] Komp’iuternaia lingvistika i intellektual’nye tekhnologii. In: Proceedings of International Conference "Dialogue", pp. 488–494 (2008)
Acknowledgments
This work was financially supported by the Russian Fund of Basic Research (RFBR), Grant No. 16-36-60055.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kovriguina, L., Shilin, I., Putintseva, A., Shipilo, A. (2018). Multilevel Annotation in the Corpus for Parsing Russian Spontaneous Speech. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)