Abstract
In this paper we present results from a study seeking to distinguish "unprepared" from "prepared" speech in broadcast news media. The idea is to explore the results from a previous experiment concerning the characterization of filled pauses and extensions, extending the analysis of such hesitation phenomena to large audio corpus. Daily news broadcasts of Portuguese television were segmented and labeled manually in terms of several speech styles, over a range of background environments. An automatic detection of filled pauses and extensions in this audio data allowed us to correlate the presence of hesitation events with segments of unprepared speech. Distinguishing unprepared speech from prepared speech is of considerable practical interest for audio segmentation, speech processing and linguistic research. The long-term objective of this work is to automatically segment all audio genres and speaking styles as well as identify prosodic and linguistic features of the speech segments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barbosa, P., Viana, M., Trancoso, I.: Cross-variety Rhythm Typology in Portuguese. In: Interspeech 2009, ISCA Brighton, UK (2009)
Braga, D., Freitas, D., Teixeira, J.P., Barros, M.J., Latsh, V.: Back Close Non-Syllabic Vowel [u] Behavior in European Portuguese: Reduction or Suppression. In: ICSP2001 (International Conference in Speech Processing), Taejon, Korea, August 22-24 (2001)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: First International Conference on Language Resources and Evaluation (LREC), pp. 1373–1376 (1998)
Candeias, S., Perdigão, F.: A realização do schwa no Português Europeu. In: 8th Symposium in Information and Human Language Technology (STIL 2011), II Workshop on Portuguese Description-JDP, Cuiabá, UFMG – Brazil (October 2011)
Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, pp. 1–6. IEEE Press, New York (2003)
Levelt, W.: Speaking. MIT Press, Cambridge (1989)
Llisterri, J.: Speaking styles in speech research. In: ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Ireland (July 1992)
Meinedo, H., Neto, J.: Audio Segmentation, Classification and Clustering in a Broadcast News Task. In: IEEE Transactions on Audio, Speech, and Language Processing Archive, vol. 18 (1). IEEE Press, Piscataway (2010)
Moniz, H., Trancoso, I., Mata, A.: Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. In: Interspeech 2009, ISCA Brighton, UK, pp. 1719–1722 (2009)
Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language 22, 171–184 (2008)
Rosenberg, A., Hirschberg, J.: Story Segmentation of Broadcast News in English, Mandarin and Arabic. In: HLT-NAACL 2006, New York (2006)
Shriberg, E., Stolcke, A., Hakkani-Tuor, D., Tur, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication 32, 127–154 (2000)
Shriberg, E.: Spontaneous speech: How people really talk, and why engineers should care. In: Interspeech 2005, Lisbon, Portugal (2005)
Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, August 17-21, pp. 2054–2057 (2011)
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Amnd Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Microsoft Corp. and Cambridge University Engineering Department, Cambridge (2006)
Delacourt, P., Welleken, C.J.: DISTBIC: A Speaker-Based Segmentation for Audio Data Indexing. Speech Communication 32, 111–126 (2000)
Reynold, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Lopes, C., Veiga, A., Perdigão, F.: Using Fingerprinting to Aid Audio Segmentation. In: VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - FALA 2010, Vigo, Spain (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F. (2012). Towards Automatic Classification of Speech Styles. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)