Skip to main content

Mapping Phonetic Features for Voice-Driven Sound Synthesis

  • Conference paper
  • 663 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 23))

Abstract

In applications where the human voice controls the synthesis of musical instruments sounds, phonetics convey musical information that might be related to the sound of the imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. We propose a user-adapted system, where mappings from voice features to synthesis parameters depend on how subjects sing musical articulations, i.e. note to note transitions. The system consists of two components. First, a voice signal segmentation module that automatically determines note-to-note transitions. Second, a classifier that determines the type of musical articulation for each transition based on a set of phonetic features. For validating our hypothesis, we run an experiment where subjects imitated real instrument recordings with their voice. Performance recordings consisted of short phrases of saxophone and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training classifier (user-dependent) are compared to a classifier based on heuristic rules (user-independent). Finally, from the previous results we show how to control the articulation in a sample-concatenation synthesizer by selecting the most appropriate samples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lesaffre, M., Tanghe, K., Martens, G., Moelants, D., Leman, M., Baets, B.D., Meyer, H.D., Martens, J.: The mami query-by-voice experiment: Collecting and annotating vocal queries for music information retrieval. In: Proceedings of the ISMIR 2003, 4th International Conference on Music Information Retrieval, Baltimore (2003)

    Google Scholar 

  2. Maestre, E., Hazan, A., Ramirez, R., Perez, A.: Using concatenative synthesis for expressive performance in jazz saxophone. In: Proceedings of International Computer Music Conference 2006, New Orleans (2006)

    Google Scholar 

  3. Jehan, T., Schoner, B.: An audio-driven, spectral analysis-based, perceptually meaningful timbre synthesizer, Amsterdam, Netherland (2001)

    Google Scholar 

  4. Poepel, C., Dannenberg, R.B.: Audio signal driven sound synthesis. In: ICMC 2005 International Computer Music Conference, Barcelona, Spain, pp. 391–394. ICMC (2005)

    Google Scholar 

  5. Janer, J.: Voice-controlled plucked bass guitar through two synthesis techniques. In: Int. Conf. on New Interfaces for Musical Expression, Vancouver, Canada, pp. 132–134 (2005)

    Google Scholar 

  6. Bonada, J., Serra, X.: Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Processing Magazine 24, 67–79 (2007)

    Article  Google Scholar 

  7. Lindemann, E.: Music synthesis with reconstructive phrase modeling. IEEE Signal Processing Magazine 24(2), 80–91 (2007)

    Article  Google Scholar 

  8. Wanderley, M., Depalle, P.: Contrôle Gestuel de la Synthèse Sonore. In: Vinet, H., Delalande, F. (eds.) Interfaces homme - machine et création musicale, pp. 145–163. Hermès Science Publishing, Paris (1999)

    Google Scholar 

  9. Egozy, E.B.: Deriving musical control features from a real-time timbre analysis of the clarinet. Master’s thesis, Massachusetts Institut of Technology (1995)

    Google Scholar 

  10. Widmer, G., Goebl, W.: Computational models of expressive music performance: The state of the art. J. New Music Research 3, 203–216 (2004)

    Article  Google Scholar 

  11. Sundberg, J.: Musical significance of musicians’ syllable choice in improvised nonsense text singing: A preliminary study. Phonetica 54, 132–145 (1994)

    Article  Google Scholar 

  12. Lieberman, P., Blumstein, S.E.: Speech physiology, speech perception, and acoustic phonetics. Cambridge University Press, Cambridge (1986)

    Google Scholar 

  13. Maestre, E., Gómez, E.: Automatic characterization of dynamics and articulation of monophonic expressive recordings. In: Procedings of the 118th AES Convention (2005)

    Google Scholar 

  14. Bonada, J., Blaauw, M., Loscos, A.: Improvements to a sample-concatenation based singing voice synthesizer. In: Proceedings of 121st Convention of the Audio Engineering Society, San Francisco, CA, USA (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Janer, J., Maestre, E. (2008). Mapping Phonetic Features for Voice-Driven Sound Synthesis. In: Filipe, J., Obaidat, M.S. (eds) E-business and Telecommunications. ICETE 2007. Communications in Computer and Information Science, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88653-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88653-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88652-5

  • Online ISBN: 978-3-540-88653-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics