An On-the-Fly Mandarin Singing Voice Synthesis System

Lin, Cheng-Yuan; Jang, J.-S. Roger; Hwang, Shaw-Hwa

doi:10.1007/3-540-36228-2_78

An On-the-Fly Mandarin Singing Voice Synthesis System

Cheng-Yuan Lin³,
J.-S. Roger Jang³ &
Shaw-Hwa Hwang⁴

Conference paper
First Online: 16 December 2002

327 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2532))

Abstract

An on-the-fly Mandarin singing voice synthesis system, called SINVOIS (singing voice synthesis), is proposed in this paper. The SINVOIS system can receive the continuous speech of the lyrics of a song, and generate the singing voice immediately based on the music score information (embedded in a MIDI file) of the song. Two sub-systems are designed and embedded into the system. One is the synthesis unit generator and the other is the pitch-shifting module. In the first one, the Viterbi decoding algorithm is employed on a continuous speech to generate the synthesis unit for singing voice. And the PSOLA method is employed to implement the pitch-shifting function in the second one. Moreover, the energy, duration, and spectrum modifications on the synthesis unit are also implemented in the second part. The synthesized singing voice sounds reasonably good. From the subjective listening test, the MOS (mean opinion score) of 3.1 are obtained for synthesized singing voices.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, Gerald, and Rodet, Xavier, “Synthesis of the singing voice,” in Current Directions in Computer Music Research (M. V. Mathews and J. R. Pierce, eds.), pp. 19–44, MIT Press, 1989.
Google Scholar
Chen, S.G. and Lin, G.J., “High Quality and Low Complexity Pitch Modification of Acoustic Signals,” Proceedings of the 1995 IEEE International Conference on Acoustic, Speech, and Signal Processing, May, Detroit, USA, 1995, p2987–2990.
Google Scholar
Chowning, John M., “Frequency Modulation Synthesis of the Singing Voice,” in Current Directions in Computer Music Research (Max. V. Mathews and John. R. Pierce, eds.), pp. 57–63, MIT Press, 1989.
Google Scholar
Cook, P.R., “SPASM, a real time vocal track physical model controller and singer, the companion software synthesis system,” Computer Music Journal, vol. 17, pp.30–43, spring 1993.
Article Google Scholar
F. Charpentier and Moulines, “Pitch-synchronous Waveform Processing Technique for Text-to-Speech Synthesis Using Diphones,” European Conf. On Speech Communication and Technology, pp.13–19, Paris, 1989.
Google Scholar
ITU-T, Methods for Subjective Determination of Transmission Quality, 1996, Int. Telecommunication Unit.
Google Scholar
Macon, Michael W. and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, “A Singing voice synthesis system based on sinusoidal modeling,” Proc. of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 435–438, 1997.
Google Scholar
Macon, Michael W., and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, “Concatenation-based MIDI-to-Singing Voice Synthesis,” 103rd Meeting of the Audio Engineering Society, New York, 1997.
Google Scholar
Macon, Michael W., M. W. Macon, “Speech Synthesis Based on Sinusoidal Modeling,” PhD thesis, Georgia Institute of Technology, October 1996.
Google Scholar
Ney, F., and Aubert, X., “Dynamic programming search: from digit strings to large vocabulary word graphs,” in C. H. Lee, F Soong, and K. Paliwal, eds., Automatic Speech and Speaker Recognition, Kluwer, Norwell, Mass., 1996.
Google Scholar
Rabiner, L., and Juang, B-H., Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., pp. 339–340, 1993.
Google Scholar
Yiying Zhang, Xiaoyan Zhu, Yu Hao, Yupin Luo, “A robust and fast endpoint detection algorithm for isolated word recognition”, IEEE International Conference on Volume: 2, 1997, Page(s): 1819–1822 vol.2
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, National Tsing Hua University, Taiwan
Cheng-Yuan Lin & J.-S. Roger Jang
Dept. of Electrical Engineering, National Taipei University, Taiwan
Shaw-Hwa Hwang

Authors

Cheng-Yuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
J.-S. Roger Jang
View author publications
You can also search for this author in PubMed Google Scholar
Shaw-Hwa Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
Yung-Chang Chen
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Long-Wen Chang & Chiou-Ting Hsu &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, CY., Jang, JS.R., Hwang, SH. (2002). An On-the-Fly Mandarin Singing Voice Synthesis System. In: Chen, YC., Chang, LW., Hsu, CT. (eds) Advances in Multimedia Information Processing — PCM 2002. PCM 2002. Lecture Notes in Computer Science, vol 2532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36228-2_78

Download citation

DOI: https://doi.org/10.1007/3-540-36228-2_78
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00262-8
Online ISBN: 978-3-540-36228-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics