Towards Automatic Classification of Speech Styles

Veiga, Arlindo; Candeias, Sara; Celorico, Dirce; Proença, Jorge; Perdigão, Fernando

doi:10.1007/978-3-642-28885-2_47

Arlindo Veiga²³,
Sara Candeias²³,
Dirce Celorico²³,
Jorge Proença²³ &
…
Fernando Perdigão^23,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

1153 Accesses
2 Citations
6 Altmetric

Abstract

In this paper we present results from a study seeking to distinguish "unprepared" from "prepared" speech in broadcast news media. The idea is to explore the results from a previous experiment concerning the characterization of filled pauses and extensions, extending the analysis of such hesitation phenomena to large audio corpus. Daily news broadcasts of Portuguese television were segmented and labeled manually in terms of several speech styles, over a range of background environments. An automatic detection of filled pauses and extensions in this audio data allowed us to correlate the presence of hesitation events with segments of unprepared speech. Distinguishing unprepared speech from prepared speech is of considerable practical interest for audio segmentation, speech processing and linguistic research. The long-term objective of this work is to automatically segment all audio genres and speaking styles as well as identify prosodic and linguistic features of the speech segments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barbosa, P., Viana, M., Trancoso, I.: Cross-variety Rhythm Typology in Portuguese. In: Interspeech 2009, ISCA Brighton, UK (2009)
Google Scholar
Braga, D., Freitas, D., Teixeira, J.P., Barros, M.J., Latsh, V.: Back Close Non-Syllabic Vowel [u] Behavior in European Portuguese: Reduction or Suppression. In: ICSP2001 (International Conference in Speech Processing), Taejon, Korea, August 22-24 (2001)
Google Scholar
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: First International Conference on Language Resources and Evaluation (LREC), pp. 1373–1376 (1998)
Google Scholar
Candeias, S., Perdigão, F.: A realização do schwa no Português Europeu. In: 8th Symposium in Information and Human Language Technology (STIL 2011), II Workshop on Portuguese Description-JDP, Cuiabá, UFMG – Brazil (October 2011)
Google Scholar
Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6. IEEE Press, New York (2003)
Google Scholar
Levelt, W.: Speaking. MIT Press, Cambridge (1989)
Google Scholar
Llisterri, J.: Speaking styles in speech research. In: ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Ireland (July 1992)
Google Scholar
Meinedo, H., Neto, J.: Audio Segmentation, Classification and Clustering in a Broadcast News Task. In: IEEE Transactions on Audio, Speech, and Language Processing Archive, vol. 18 (1). IEEE Press, Piscataway (2010)
Google Scholar
Moniz, H., Trancoso, I., Mata, A.: Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. In: Interspeech 2009, ISCA Brighton, UK, pp. 1719–1722 (2009)
Google Scholar
Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language 22, 171–184 (2008)
Article Google Scholar
Rosenberg, A., Hirschberg, J.: Story Segmentation of Broadcast News in English, Mandarin and Arabic. In: HLT-NAACL 2006, New York (2006)
Google Scholar
Shriberg, E., Stolcke, A., Hakkani-Tuor, D., Tur, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication 32, 127–154 (2000)
Article Google Scholar
Shriberg, E.: Spontaneous speech: How people really talk, and why engineers should care. In: Interspeech 2005, Lisbon, Portugal (2005)
Google Scholar
Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, August 17-21, pp. 2054–2057 (2011)
Google Scholar
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Amnd Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Microsoft Corp. and Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Delacourt, P., Welleken, C.J.: DISTBIC: A Speaker-Based Segmentation for Audio Data Indexing. Speech Communication 32, 111–126 (2000)
Article Google Scholar
Reynold, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Lopes, C., Veiga, A., Perdigão, F.: Using Fingerprinting to Aid Audio Segmentation. In: VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - FALA 2010, Vigo, Spain (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

DEEC, Instituto de Telecomunicações, pole of Coimbra, Coimbra, Portugal
Arlindo Veiga, Sara Candeias, Dirce Celorico, Jorge Proença & Fernando Perdigão
DEEC/FCTUC, Universidade de Coimbra, Coimbra, Portugal
Fernando Perdigão

Authors

Arlindo Veiga
View author publications
You can also search for this author in PubMed Google Scholar
Sara Candeias
View author publications
You can also search for this author in PubMed Google Scholar
Dirce Celorico
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Proença
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Perdigão
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UFSCAR, Rod. Washington Luís, 13565-905, São Carlos, Brazil
Helena Caseli
UFRGS, Av. Bento Gonçalves, 9500, 91501-970, Porto Alegre, Brazil
Aline Villavicencio
DETI/IEETA, Universidade de Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
António Teixeira
UC/ IT, DEEC, Universidade de Coimbra, Polo 2, 3030-290, Coimbra, Portugal
Fernando Perdigão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F. (2012). Towards Automatic Classification of Speech Styles. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-28885-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics