Abstract
In this paper we present an experimental study on how corpus-based automatic prosodic information labeling can be transferred from a source language to a different target language. The Spanish ESMA corpus is used to train models for the identification of the prominent words. Then, the models are used to identify the accented words of the English Boston University Radio News Corpus (BURNC). The inverse process (training the models with English data and testing with the Spanish corpus) is also contrasted with the results obtained in the conventional scenario: training and testing using the same corpus. We got up to 82.7% correct annotation rates in cross-lingual experiments, which contrast slightly with the accuracy obtained in a mono-lingual single speaker scenarios (86.6% for Spanish and 80.5% for English). Speaker independent monolingual recognition experiments have been also performed with the BURNC corpus, leading to cross-speakers results that go from 69.3% to 84.2% recognition rates. As these results are comparable to the ones obtained in the cross-lingual scenario we conclude that the new approach we defend has to face up with similar challenges as the ones presented in speaker independent scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ananthakrishnan, S., Narayanan, S.: Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence. IEEE Transactions on Audio, Speech, and Language Processing 16(1), 216–228 (2008)
Bonafonte, A., Moreno, A.: Documentation of the upc-esma spanish database. Tech. rep., TALP Research Center, Universitat Politecnica de Catalunya, Barcelona, Spain (2008)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Escudero, D., Cardeñoso, V.: Applying data mining techniques to corpus based prosodic modeling speech. Speech Communication 49, 213–229 (2007)
Escudero-Mancebo, D., Vivaracho Pascual, C., González Ferreras, C., Cardeñoso-Payo, V., Aguilar, L.: Analysis of inconsistencies in cross-lingual automatic ToBI tonal accent labeling. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 41–48. Springer, Heidelberg (2011)
Gonzalez, C., Vivaracho, C., Escudero, D., Cardenoso, V.: On the Automatic ToBI Accent Type Identification from Data. In: Interspeech 2010 (2010)
Gori, M.: Are multilayer perceptrons adequate for pattern recognition and verification? IEEE Trans. on Pattern Analysis and Machine Intelligence 20(11), 1121–1132 (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1), 10–18 (2009)
Meteer, M., Schwartz, R.M., Weischedel, R.M.: Post: Using probabilities in language processing. In: IJCAI, pp. 960–965 (1991)
Ostendorf, M., Price, P., Shattuck, S.: The boston university radio news corpus. Tech. rep., Boston University (1995)
Prieto, P., Rosedano, P.: Transcription of Intonation of the Spanish Language. LINCOM Studies in Phonetics, vol. 06 (2010)
Rangarajan Sridhar, V., Bangalore, S., Narayanan, S.: Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework. IEEE Transactions on Audio, Speech, and Language Processing 16(4), 797–811 (2008)
Syrdal, A.K., Hirshberg, J., McGory, J., Beckman, M.: Automatic ToBI prediction and alignment to speed manual labeling of prosody. Speech Communication (33), 135–151 (2001)
Vivaracho-Pascual, Simon-Hurtado, A.: Improving ann performance for imbalanced data sets by means of the ntil technique. In: IEEE International Joint Conference on Neural Networks (July 18-23, 2010)
Wightman, C., Ostendorf, M.: Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing 2(4), 469–481 (1994)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Escudero-Mancebo, D., Aguilar, L., González Ferreras, C., Vivaracho Pascual, C., Cardeñoso-Payo, V. (2011). Cross-Lingual English Spanish Tonal Accent Labeling Using Decision Trees and Neural Networks. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)