Segment Specific Concatenation Cost for Syllable Based Bengali TTS

Narendra, N. P.; Rao, K. Sreenivasa

doi:10.1007/978-3-642-22606-9_38

N. P. Narendra⁸ &
K. Sreenivasa Rao⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 168))

Included in the following conference series:

International Conference on Contemporary Computing

1142 Accesses

Abstract

This paper proposes a new method of concatenation cost calculation for enhancing the optimality in unit selection. Instead of defining same set of concatenation costs for all types of speech unit transitions, costs are defined based on the type of unit transitions. Different types of unit transitions that can occur mainly in an utterance are voiced to voiced, voiced to unvoiced and unvoiced to unvoiced transitions. Natural measure of continuity is identified for each of these transitions, and costs are defined accordingly. For voiced to voiced transitions, in addition to spectral continuity, pitch and energy continuity metrics are proposed. In case of voiced to unvoiced and unvoiced to unvoiced transitions, silence duration embedded in the unvoiced region is proposed as the continuity metric. This approach of segment specific concatenation cost calculation improves the quality of syllable based text to speech synthesis. Listening tests provide a proof on the effectiveness of proposed methodology which has clearly shown the decrease in perceptual discontinuity at joins, and improvement in the overall quality of the synthesised speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 1, pp. 373–376 (1996)
Google Scholar
Black, A.W., Taylor, P.: Automatically clustering similar units for unit selection in speech synthesis. In: Eurospeech 1997, vol. 2, pp. 601–604 (1997)
Google Scholar
Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., Raptis, S.: One-class classification for spectral join cost calculation in unit selection speech synthesis. IEEE Signal Processing Letters 17(8), 746–749 (2010)
Article Google Scholar
Vepa, J., King, S.: Join cost for unit selection speech synthesis, pp. 35–62. Prentice-Hall, NJ (2004)
Google Scholar
Dong, M., Lua, K.T., Li, H.: Unit selection-based speech synthesis approach for mandarian chinese. Journal of Chinese Language and Computing, 135–144 (2006)
Google Scholar
Blouin, C., Rosec, O., Bagshaw, P.C., d’Alessandro, C.: Concatenation Cost Calculation and Optimization for Unit Selection in TTS. In: IEEEWorkshop on Speech Synthesis, SantaMonica CA, USA (2002)
Google Scholar
Conkie, A., Isard, S.: Progress in speech synthesis. Progress in speech synthesis (1997)
Google Scholar
Benesty, J., Sondhi, M.M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Book Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
N. P. Narendra & K. Sreenivasa Rao

Authors

N. P. Narendra
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Iowa State University and IIT Bombay, India, 329 Durham, Ames, IA 5001, Iowa, USA
Srinivas Aluru
Indian Statistical Institute, 203 B.T. Road, 700 108, Kolkata, West Bengal, India
Sanghamitra Bandyopadhyay
The Ohio State University, 3190 Graves Hall, 333 W 10th Ave, 43210, Columbus, OH, USA
Umit V. Catalyurek
Department of Computing Science, Chalmers University, Rännvagen 6B, 412 96, Göteborg, Sweden
Devdatt P. Dubhashi
Dept. of Electrical and Computer Engineering, Iowa State University, 329 Durham, IA 50011, Ames, USA
Phillip H. Jones
TASSL, Dept. of Electrical & Computer Engineering, Rutgers, The State University of New Jersey, Brett Road, NJ 08854-8058, Piscataway, USA
Manish Parashar
School of Computer Engineering, Nanyang Technological University, N4-02a-32 Nanyang Ave, 639798, Singapore
Bertil Schmidt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narendra, N.P., Rao, K.S. (2011). Segment Specific Concatenation Cost for Syllable Based Bengali TTS. In: Aluru, S., et al. Contemporary Computing. IC3 2011. Communications in Computer and Information Science, vol 168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22606-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-22606-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22605-2
Online ISBN: 978-3-642-22606-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics