Skip to main content

In Search of Sentence Boundaries in Spontaneous Speech

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

Oral text is certainly discrete. It is built of “small bricks”, units of not only lexical but also the higher syntactical level. Common syntagmatic pauses, hesitative pauses such as physical (unfilled ones including breaks of clauses), sound pauses (e-e, m-m), and verbal (vot, kak eto, nu, znachit etc.) are markers of this discreetness. However, that reveals neither syntagma nor sentence as a unit to describe a syntactic structure of an oral text. Any type of pauses may occur in any place of an audio sequence. Thus, the search of sentences in spontaneous speech is quite complicated. In order to obtain such units a methodic of coercive punctuation that was used for marking the spontaneous monologues from the collection of oral texts named «Balanced Annotated Textotec» could be offered. The testee (philology experts) were asked to mark ends of the sentences by putting a period in the transcripts where neither pauses nor punctuation had been marked. The testee could only rely on the syntactic structure of the text and the connection between words and predicate centers. Involving more than twenty experts in an experiment provides more statistically accurate results. In this work we describe the results of our experiment and discuss further perspectives how those results can be used for automatic search of sentence boundaries in spontaneous speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bogdanova, N.V.: Basic concepts of speech and language (search of units description of spontaneous speech). In: The MAPRYAL Conference. Innovation in Studies of Russian Language, Literature and Culture. Proceedings, Hungary, Plovdiv, vol. I, pp. 189–194 (2007) (In Russian)

    Google Scholar 

  2. Bogdanova-Beglarian, N.V. (ed.): Speech Corpus as a Base for Analysis of Russian Speech. Collective Monograph. Part 1. Reading. Retelling. Description, 532 p. Philological Faculty of SPb State Univ. Publ., St. Petersburg (2013) (In Russian)

    Google Scholar 

  3. Bogdanova-Beglarian, N.V., Sherstinova. T.J., Zajdes, K.D.: Corpus «balanced annotated textotec»: methodology multi-level analysis of the russian monologue speech. In: Kocharov, D.A., Skrelin, P.A. (eds.) Analysis of Conversational Russian Speech Proceedings of the 7th Interdisciplinary Seminar, pp. 8–13. Polytekhnica-print, St. Petersburg (2017) (In Russian)

    Google Scholar 

  4. Zemskaja, E.A. (ed.): Russian Conversational Speech, 485 p. Nauka, Moscow (1973) (In Russian)

    Google Scholar 

  5. Sirotinina, O.B.: Modern Conversational Speech and Its Peculiarities, 144 p. Prosveshchenie, Moscow (1974) (In Russian)

    Google Scholar 

  6. Lapteva, O.A.: Russian Conversational Syntax, 397 p. Nauka, Moscow (1976) (In Russian)

    Google Scholar 

  7. Chafe, W.: Integration and involvement in speaking, writing, and oral literature. In: Tannen, D. (ed.) Spoken and Written Language: Exploring Orality and Literacy, pp. 35–54. Ablex, Norwood (1982)

    Google Scholar 

  8. Podlesskaya, V.I., Kibrik, A.A.: Disfluency correction in spontaneous speech: a corpus study. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference, Dialog 2005, [Electronic Resource] (2005). http://www.dialog-21.ru/media/2416/podlesskaya-kibrik.pdf. (In Russian)

  9. Andreeva. S.V.: Elementary Constructive-Syntactic Units of Spoken Language and Their Communicative Potential. Author’s Abstract of Ph.D. Thesis. 49 p. Saratov (2005) (In Russian)

    Google Scholar 

  10. Andreeva. S.V.: Speech Units of Spoken Russian Language: System, Area Use, Functions. In: Sirotinina, O.B. (ed.) 192 p. Moscow (2006) (In Russian)

    Google Scholar 

  11. Filippova, N.S.: Principles of Oral Descriptive Discourse (a Case Study of Russian Spontaneous Speech). Ph.D. Thesis (typing), 220 p. St. Petersburg (2010) (In Russian)

    Google Scholar 

  12. Bogdanova, N.V.: On the syntactic correlates of the differentiation of levels of speech culture. In: Zhivoe slovo v russkoj rechi Prikamja [The Living Word in the Russian Language of Kama Region], pp. 92–100. Perm. State Univ. Publ., Perm (1993) (In Russian)

    Google Scholar 

  13. Bogdanova, N.V., Brodt, I.S.: Divisibility of oral different types monologues. In: Remneva, M.L. (ed.) Problems of Russian Linguistics, Iss. XI. Aspects of speech study. The Collection for the Elena Andreevna Bryzgunova’s Anniversary, pp. 95–97. Moscow State Univ. Publ., Moscow (2004) (In Russian)

    Google Scholar 

  14. Stepikhov, A.A.: The Correlation of Syntactic and Intonational Discreteness of Spontaneous Monologue. Ph.D. Thesis (typing), 197 p. St. Petersburg (2005) (In Russian)

    Google Scholar 

  15. Vol’skaya, N.B., Stepanova, S.B.: Some problems of syntagmatic divisibility of a spontaneous text. In: XXX International Philological Conference. Phonetics Panel. Part 1, vol. 10, pp. 16–24. Philological Faculty of SPbGU Publ., St. Petersburg (2005) (In Russian)

    Google Scholar 

  16. Bogdanova, N.V., Brodt, I.S., Kukanova, V.V., Pavlova, O.V., Sapunova, E.M., Filippova, N.S.: The corpus of spoken russian: design principles and approaches to data analysis. In: Kibrik, A.E. (ed.) Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference, Dialog 2008, pp. 57–61. Russian State Human. Univ. Publ., Moscow (2008) (In Russian)

    Google Scholar 

  17. Bogdanova, N.V.: The corpus of spoken russian: new receipts and the first results of research. In: Kibrik, A.E. (ed.) Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference, Dialog 2010, pp. 35–40. Russian State Human. Univ. Publ., Moscow (2010) (In Russian)

    Google Scholar 

  18. Nikolaeva, T.M.: Semantic division of the text and its individual variants. In: Semiotics and Text Structure, pp. 71–79. Warszawa (1973) (In Russian)

    Google Scholar 

  19. Nikolaeva, T.M.: Text linguistics: present state and perspectives. In: New in Foreign Linguistics. Iss. VIII, pp. 5–39. Progress, Moscow (1978) (In Russian)

    Google Scholar 

  20. Stepikhov, A.: Sociolinguistic factors in text-based sentence boundary detection. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) Speech and Computer. 17th International Conference, SPECOM 2015, Athens, Greece, 20–24 September 2015, Proceedings, pp. 372–380. Athens (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Bogdanova-Beglarian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bogdanova-Beglarian, N. (2017). In Search of Sentence Boundaries in Spontaneous Speech. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics