Abstract
In this paper we present the work that has been achieved in the context of the second version of the Ramcess singing synthesis framework. The main improvement of this study is the integration of new algorithms for expressive voice analysis, especially the separation of the glottal source and the vocal tract. Realtime synthesis modules have also been refined. These elements have been integrated in an existing digital instrument: the HandSketch 1.x, a bi-manual controller. Moreover this digital instrument is compared to existing systems.
Similar content being viewed by others
References
Bonada J, Serra X (2007) Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Process 24(2):67–79
Kawahara H (1999) Restructuring speech representations using a pitch-adaptative time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27:187–207
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Bozkurt B (2005) New spectral methods for the analysis of source/filter characteristics of speech signals. PhD thesis, Faculté Polytechnique de Mons
Henrich N (2001) Etude de la source glottique en voix parlée et chantée: modélisation et estimation, mesures acoustiques et electroglottographiques, perception. PhD thesis, Université de Paris VI
Doval B, d’Alessandro C, Henrich N (2006) The spectrum of glottal flow models. Acta Acustica 92:1026–1046
Doval B, d’Alessandro C (2003) The voice source as a causal/anticausal linear filter. In: Proceedings of Voqual’03, voice quality: functions, analysis and synthesis, ISCA workshop
Sundberg J (1974) Articulatory interpretation of the singing formant. J Acoust Soc Am 55:838–844
Boite R, Bourlard H, Dutoit T, Hancq J, Leich H (2000) Traitement de la parole
Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Commun 49(3):159–176
Dubuisson T, Dutoit T (2007) Improvement of source-tract decomposition of speech using analogy with LF model for glottal source and tube model for vocal tract. In: Proceedings of models and analysis of vocal emissions for biomedical application workshop, pp 119–122
Edelman A, Murakami H (1995) Polynomial roots from companion matrix eigenvalues. Math Comput 64(210):763–776
Bozkurt B, Doval B, d’Alessandro C, Dutoit T (2005) Zeros of the Z-transform representation with application to source-filter separation in speech. IEEE Signal Process Lett 12(4):344–347
Fant G, Liljencrants J, Lin Q (1985) A four-parameter model of glottal flow. STL-QPSR 4:1–13
Fant G (1960) Acoustic theory of speech production. Mouton and Co, Netherlands
Vincent D, Rosec O, Chonavel T (2005) Estimation of LF glottal source parameters based on ARX model. In: Proceedings of Interspeech, Lisbonne, pp 333–336
Vincent D, Rosec O, Chonavel T (2007) A new method for speech synthesis and transformation based on an ARX-LF source-filter decomposition and HNM modeling. In: Proceedings of ICASSP, Honolulu, pp 525–528
d’Alessandro N, Dutoit T (2007) HandSketch bi-manual controller. In: Proceedings of NIME, pp 78–81
Schwarz D, Wright M (2000) Extensions and applications of the SDIF sound description interchange format. In: International computer music conference
d’Alessandro N, Doval B, Beux SL, Woodruff P, Fabre Y, d’Alessandro C, Dutoit T (2007) Realtime and accurate musical control of expression in singing synthesis. J Multimodal User Interfaces 1(1):31–39
d’Alessandro N, Dutoit T (2007) RAMCESS/HandSketch: a multi-representation framework for realtime and expressive singing synthesis. In: Proceedings of Interspeech’07, pp TuC. SS–5
Birkholz P, Steiner I, Breuer S (2007) Control concepts for articulatory speech synthesis. In: Proceedings of the 6th ISCA workshop on speech synthesis
Berndtsson G, Sundberg J (1993) The MUSSE DIG singing synthesis. In: Proceedings of the Stockholm music acoustics conference, pp 279–281
d’Alessandro N, Dubuisson T, Moinet A, Dutoit T (2007) Causal/anticausal decomposition for mixed-phase description of brass and bowed string sounds. In: Proceedings of international computer music conference, pp 465–468
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
d‘Alessandro, N., Babacan, O., Bozkurt, B. et al. RAMCESS 2.X framework—expressive voice analysis for realtime and accurate synthesis of singing. J Multimodal User Interfaces 2, 133–144 (2008). https://doi.org/10.1007/s12193-008-0010-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-008-0010-4