Skip to main content

An On-the-Fly Mandarin Singing Voice Synthesis System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2532))

Abstract

An on-the-fly Mandarin singing voice synthesis system, called SINVOIS (singing voice synthesis), is proposed in this paper. The SINVOIS system can receive the continuous speech of the lyrics of a song, and generate the singing voice immediately based on the music score information (embedded in a MIDI file) of the song. Two sub-systems are designed and embedded into the system. One is the synthesis unit generator and the other is the pitch-shifting module. In the first one, the Viterbi decoding algorithm is employed on a continuous speech to generate the synthesis unit for singing voice. And the PSOLA method is employed to implement the pitch-shifting function in the second one. Moreover, the energy, duration, and spectrum modifications on the synthesis unit are also implemented in the second part. The synthesized singing voice sounds reasonably good. From the subjective listening test, the MOS (mean opinion score) of 3.1 are obtained for synthesized singing voices.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett, Gerald, and Rodet, Xavier, “Synthesis of the singing voice,” in Current Directions in Computer Music Research (M. V. Mathews and J. R. Pierce, eds.), pp. 19–44, MIT Press, 1989.

    Google Scholar 

  2. Chen, S.G. and Lin, G.J., “High Quality and Low Complexity Pitch Modification of Acoustic Signals,” Proceedings of the 1995 IEEE International Conference on Acoustic, Speech, and Signal Processing, May, Detroit, USA, 1995, p2987–2990.

    Google Scholar 

  3. Chowning, John M., “Frequency Modulation Synthesis of the Singing Voice,” in Current Directions in Computer Music Research (Max. V. Mathews and John. R. Pierce, eds.), pp. 57–63, MIT Press, 1989.

    Google Scholar 

  4. Cook, P.R., “SPASM, a real time vocal track physical model controller and singer, the companion software synthesis system,” Computer Music Journal, vol. 17, pp.30–43, spring 1993.

    Article  Google Scholar 

  5. F. Charpentier and Moulines, “Pitch-synchronous Waveform Processing Technique for Text-to-Speech Synthesis Using Diphones,” European Conf. On Speech Communication and Technology, pp.13–19, Paris, 1989.

    Google Scholar 

  6. ITU-T, Methods for Subjective Determination of Transmission Quality, 1996, Int. Telecommunication Unit.

    Google Scholar 

  7. Macon, Michael W. and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, “A Singing voice synthesis system based on sinusoidal modeling,” Proc. of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 435–438, 1997.

    Google Scholar 

  8. Macon, Michael W., and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, “Concatenation-based MIDI-to-Singing Voice Synthesis,” 103rd Meeting of the Audio Engineering Society, New York, 1997.

    Google Scholar 

  9. Macon, Michael W., M. W. Macon, “Speech Synthesis Based on Sinusoidal Modeling,” PhD thesis, Georgia Institute of Technology, October 1996.

    Google Scholar 

  10. Ney, F., and Aubert, X., “Dynamic programming search: from digit strings to large vocabulary word graphs,” in C. H. Lee, F Soong, and K. Paliwal, eds., Automatic Speech and Speaker Recognition, Kluwer, Norwell, Mass., 1996.

    Google Scholar 

  11. Rabiner, L., and Juang, B-H., Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., pp. 339–340, 1993.

    Google Scholar 

  12. Yiying Zhang, Xiaoyan Zhu, Yu Hao, Yupin Luo, “A robust and fast endpoint detection algorithm for isolated word recognition”, IEEE International Conference on Volume: 2, 1997, Page(s): 1819–1822 vol.2

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, CY., Jang, JS.R., Hwang, SH. (2002). An On-the-Fly Mandarin Singing Voice Synthesis System. In: Chen, YC., Chang, LW., Hsu, CT. (eds) Advances in Multimedia Information Processing — PCM 2002. PCM 2002. Lecture Notes in Computer Science, vol 2532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36228-2_78

Download citation

  • DOI: https://doi.org/10.1007/3-540-36228-2_78

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00262-8

  • Online ISBN: 978-3-540-36228-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics