Abstract
This work presents a mining methodology designed to cope with the usual data scarcity problems of intonation corpora which arises from the high variability of prosodic information. The methodology is an adaptation of a basic agglomerative clustering technique, guided by a set of domain constraints. The peculiarities of the text-to-speech intonation modelling problem are considered in order to fix the initial configuration of the cluster and the criteria to merge classes and stopping their splitting. The scarcity problem poses the need to apply a sequential selection mechanism of prosodic features, in order to obtain the initial set of classes in the cluster. A searching strategy to select the best class among a set of alternatives is proposed, which provides useful prediction models for accurate synthetic intonation. Visualization of final classes by means of a modified decision tree brings graphical cues about contrastable prosodic information of the intonation corpus.
This work has been partially sponsored by Spanish Government (MCYT project TIC2003-08382-C05-03) and by Consejería de Educación (JCYL project VA053A05).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aguado, P.D., Wimmer, K., Bonafonte, A.: Joint extraction and prediction of fujisaki’s intonation model parameters. In: Proceedings of Eurospeech 2005 (2005)
Allen, J., Hunnicutt, M.S., Klatt, D.: From Text to Speech: The MITalk System. Cambridge University Press, Cambridge (1987)
Botinis, A., Granstrom, B., Moebius, B.: Developments and Paradigms in Intonation Research. Speech Communications 33, 263–296 (2001)
Cardeoso, V., Escudero, D.: A strategy to solve data scarcity problems in corpus based intonation modelling. In: Proceedings of ICASSP 2004 (2004)
Escudero, D.: Modelado Estadstico de Entonacin con Funciones de Bzier: Aplicaciones a la Conversin Texto Voz. PhD thesis, Dpto. de Informtica, Universidad de Valladolid, Espaa (2002)
Escudero, D., Cardeoso, V., Bonafonte, A.: Corpus based extraction of quantitative prosodic parameters of stress groups in spanish. In: Proceedings of ICASSP 2002, Mayo (2002)
Escudero, D., Cardeoso, V.: Optimized selection of intonation dictionaries in corpus based intonation modelling. In: Proceedings of Eurospeech (September 2005)
Gerhard, D.: Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06, Department of Computer Science, University of Regina, Regina, Saskatchewan, CANADA (November 2003)
Hart, J., Collier, R., Cohen, A.: A perceptual study of intonation. An experimental approach to speech melody. Cambridge University Press, Cambridge (1990)
Hermes, D.J.: Measuring the perceptual similarity of pitch contours. Journal of Speech, Language, and Hearing Research 41, 73–82 (1994)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Joskisch, O., Mixdorff, H., Kruschke, H., Kordon, U.: Learning the parameters of quantitative prosody models. In: Proceedings of ICSLP 2000 (2000)
Navarro-Toms, T.: Manual de Entonacin Espaola. Madrid, Guadarrama (1944)
Sakai, S.: Additive modeling of english f0 contours for speech synthesis. In: Proceedings of ICASSP 2005 (2005)
Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., Stolcke, A.: Modeling Prosodic Feature Sequences for Speaker Recognition. Speech Communication 46(3-4), 455–472 (2005)
Shriberg, E., Stolcke, A., Hakkani, D., Tur, G.: Prosody-Based Automatic Segmentation into Sentences and Topics. Speech Communication 32(1-2), 127–154 (2000)
Sosa, J.M.: La Entonacin del Espaol. Ctedra (1999)
Sproat, R.: Multilingual Text-to-Speech Synthesis. Kluwer, Dordrecht (1998)
Taylor, P.: Analysis and Synthesis of Intonation using the Tilt Model. Journal of Acoustical Society of America 107(3), 1697–1714 (2000)
Webb, A.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Escudero-Mancebo, D., Cardeñoso-Payo, V. (2006). Mining Intonation Corpora Using Knowledge Driven Sequential Clustering. In: Sichman, J.S., Coelho, H., Rezende, S.O. (eds) Advances in Artificial Intelligence - IBERAMIA-SBIA 2006. IBERAMIA SBIA 2006 2006. Lecture Notes in Computer Science(), vol 4140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11874850_40
Download citation
DOI: https://doi.org/10.1007/11874850_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45462-5
Online ISBN: 978-3-540-45464-9
eBook Packages: Computer ScienceComputer Science (R0)