Skip to main content
Log in

RAMCESS 2.X framework—expressive voice analysis for realtime and accurate synthesis of singing

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

In this paper we present the work that has been achieved in the context of the second version of the Ramcess singing synthesis framework. The main improvement of this study is the integration of new algorithms for expressive voice analysis, especially the separation of the glottal source and the vocal tract. Realtime synthesis modules have also been refined. These elements have been integrated in an existing digital instrument: the HandSketch 1.x, a bi-manual controller. Moreover this digital instrument is compared to existing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bonada J, Serra X (2007) Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Process 24(2):67–79

    Article  Google Scholar 

  2. Kawahara H (1999) Restructuring speech representations using a pitch-adaptative time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27:187–207

    Article  Google Scholar 

  3. http://www.enterface.net

  4. Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580

    Article  Google Scholar 

  5. Bozkurt B (2005) New spectral methods for the analysis of source/filter characteristics of speech signals. PhD thesis, Faculté Polytechnique de Mons

  6. Henrich N (2001) Etude de la source glottique en voix parlée et chantée: modélisation et estimation, mesures acoustiques et electroglottographiques, perception. PhD thesis, Université de Paris VI

  7. Doval B, d’Alessandro C, Henrich N (2006) The spectrum of glottal flow models. Acta Acustica 92:1026–1046

    Google Scholar 

  8. Doval B, d’Alessandro C (2003) The voice source as a causal/anticausal linear filter. In: Proceedings of Voqual’03, voice quality: functions, analysis and synthesis, ISCA workshop

  9. Sundberg J (1974) Articulatory interpretation of the singing formant. J Acoust Soc Am 55:838–844

    Article  Google Scholar 

  10. Boite R, Bourlard H, Dutoit T, Hancq J, Leich H (2000) Traitement de la parole

  11. http://www.phon.ucl.ac.uk/home/sampa/

  12. Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Commun 49(3):159–176

    Article  Google Scholar 

  13. Dubuisson T, Dutoit T (2007) Improvement of source-tract decomposition of speech using analogy with LF model for glottal source and tube model for vocal tract. In: Proceedings of models and analysis of vocal emissions for biomedical application workshop, pp 119–122

  14. Edelman A, Murakami H (1995) Polynomial roots from companion matrix eigenvalues. Math Comput 64(210):763–776

    Article  MATH  MathSciNet  Google Scholar 

  15. Bozkurt B, Doval B, d’Alessandro C, Dutoit T (2005) Zeros of the Z-transform representation with application to source-filter separation in speech. IEEE Signal Process Lett 12(4):344–347

    Article  Google Scholar 

  16. Fant G, Liljencrants J, Lin Q (1985) A four-parameter model of glottal flow. STL-QPSR 4:1–13

    Google Scholar 

  17. Fant G (1960) Acoustic theory of speech production. Mouton and Co, Netherlands

    Google Scholar 

  18. Vincent D, Rosec O, Chonavel T (2005) Estimation of LF glottal source parameters based on ARX model. In: Proceedings of Interspeech, Lisbonne, pp 333–336

  19. Vincent D, Rosec O, Chonavel T (2007) A new method for speech synthesis and transformation based on an ARX-LF source-filter decomposition and HNM modeling. In: Proceedings of ICASSP, Honolulu, pp 525–528

  20. http://www.cycling74.com

  21. http://www.puredata.org

  22. d’Alessandro N, Dutoit T (2007) HandSketch bi-manual controller. In: Proceedings of NIME, pp 78–81

  23. Schwarz D, Wright M (2000) Extensions and applications of the SDIF sound description interchange format. In: International computer music conference

  24. d’Alessandro N, Doval B, Beux SL, Woodruff P, Fabre Y, d’Alessandro C, Dutoit T (2007) Realtime and accurate musical control of expression in singing synthesis. J Multimodal User Interfaces 1(1):31–39

    Article  Google Scholar 

  25. d’Alessandro N, Dutoit T (2007) RAMCESS/HandSketch: a multi-representation framework for realtime and expressive singing synthesis. In: Proceedings of Interspeech’07, pp TuC. SS–5

  26. Birkholz P, Steiner I, Breuer S (2007) Control concepts for articulatory speech synthesis. In: Proceedings of the 6th ISCA workshop on speech synthesis

  27. Berndtsson G, Sundberg J (1993) The MUSSE DIG singing synthesis. In: Proceedings of the Stockholm music acoustics conference, pp 279–281

  28. d’Alessandro N, Dubuisson T, Moinet A, Dutoit T (2007) Causal/anticausal decomposition for mixed-phase description of brass and bowed string sounds. In: Proceedings of international computer music conference, pp 465–468

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas d‘Alessandro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

d‘Alessandro, N., Babacan, O., Bozkurt, B. et al. RAMCESS 2.X framework—expressive voice analysis for realtime and accurate synthesis of singing. J Multimodal User Interfaces 2, 133–144 (2008). https://doi.org/10.1007/s12193-008-0010-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-008-0010-4

Keywords

Navigation