Skip to main content

Towards Automatic Classification of Speech Styles

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2012)

Abstract

In this paper we present results from a study seeking to distinguish "unprepared" from "prepared" speech in broadcast news media. The idea is to explore the results from a previous experiment concerning the characterization of filled pauses and extensions, extending the analysis of such hesitation phenomena to large audio corpus. Daily news broadcasts of Portuguese television were segmented and labeled manually in terms of several speech styles, over a range of background environments. An automatic detection of filled pauses and extensions in this audio data allowed us to correlate the presence of hesitation events with segments of unprepared speech. Distinguishing unprepared speech from prepared speech is of considerable practical interest for audio segmentation, speech processing and linguistic research. The long-term objective of this work is to automatically segment all audio genres and speaking styles as well as identify prosodic and linguistic features of the speech segments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbosa, P., Viana, M., Trancoso, I.: Cross-variety Rhythm Typology in Portuguese. In: Interspeech 2009, ISCA Brighton, UK (2009)

    Google Scholar 

  2. Braga, D., Freitas, D., Teixeira, J.P., Barros, M.J., Latsh, V.: Back Close Non-Syllabic Vowel [u] Behavior in European Portuguese: Reduction or Suppression. In: ICSP2001 (International Conference in Speech Processing), Taejon, Korea, August 22-24 (2001)

    Google Scholar 

  3. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: First International Conference on Language Resources and Evaluation (LREC), pp. 1373–1376 (1998)

    Google Scholar 

  4. Candeias, S., Perdigão, F.: A realização do schwa no Português Europeu. In: 8th Symposium in Information and Human Language Technology (STIL 2011), II Workshop on Portuguese Description-JDP, Cuiabá, UFMG – Brazil (October 2011)

    Google Scholar 

  5. Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6. IEEE Press, New York (2003)

    Google Scholar 

  6. Levelt, W.: Speaking. MIT Press, Cambridge (1989)

    Google Scholar 

  7. Llisterri, J.: Speaking styles in speech research. In: ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Ireland (July 1992)

    Google Scholar 

  8. Meinedo, H., Neto, J.: Audio Segmentation, Classification and Clustering in a Broadcast News Task. In: IEEE Transactions on Audio, Speech, and Language Processing Archive, vol. 18 (1). IEEE Press, Piscataway (2010)

    Google Scholar 

  9. Moniz, H., Trancoso, I., Mata, A.: Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. In: Interspeech 2009, ISCA Brighton, UK, pp. 1719–1722 (2009)

    Google Scholar 

  10. Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language 22, 171–184 (2008)

    Article  Google Scholar 

  11. Rosenberg, A., Hirschberg, J.: Story Segmentation of Broadcast News in English, Mandarin and Arabic. In: HLT-NAACL 2006, New York (2006)

    Google Scholar 

  12. Shriberg, E., Stolcke, A., Hakkani-Tuor, D., Tur, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication 32, 127–154 (2000)

    Article  Google Scholar 

  13. Shriberg, E.: Spontaneous speech: How people really talk, and why engineers should care. In: Interspeech 2005, Lisbon, Portugal (2005)

    Google Scholar 

  14. Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, August 17-21, pp. 2054–2057 (2011)

    Google Scholar 

  15. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Amnd Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Microsoft Corp. and Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  16. Delacourt, P., Welleken, C.J.: DISTBIC: A Speaker-Based Segmentation for Audio Data Indexing. Speech Communication 32, 111–126 (2000)

    Article  Google Scholar 

  17. Reynold, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

  18. Lopes, C., Veiga, A., Perdigão, F.: Using Fingerprinting to Aid Audio Segmentation. In: VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - FALA 2010, Vigo, Spain (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F. (2012). Towards Automatic Classification of Speech Styles. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28885-2_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28884-5

  • Online ISBN: 978-3-642-28885-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics