An Improvement of Prosodic Characteristics in Vietnamese Text to Speech System

Phan, Thanh Son; Dinh, Anh Tuan; Vu, Tat Thang; Luong, Chi Mai

doi:10.1007/978-3-319-02741-8_10

Thanh Son Phan⁷,
Anh Tuan Dinh⁸,
Tat Thang Vu⁸ &
…
Chi Mai Luong⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 244))

1027 Accesses
2 Citations

Abstract

One important goal of TTS system is to generate natural-sounding synthesized voice. To meet the goal, a variety of tasks are performed to model the prosodic aspects of TTS voice. The task being discussed here is POS and Intonation tagging. The paper examines the effects of POS and Intonation information on the naturalness of a hidden Markov model (HMM) based speech when other resources are not available. It is discovered that, when a limited feature set is used for HMM context labels, the POS and Intonation tags improve the naturalness of the synthesized voice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: HSMM-Based Model adaptation algorithms for Average-Voice-Based speech synthesis. In: ICASSP 2006, pp. 77–80 (2006)
Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. ICASSP 2000, pp. 1315–1318 (June 2000)
Google Scholar
Mixdorff, H., Nguyen, H.B., Fujisaki, H., Luong, C.M.: Quantitative Analysis and Synthesis of Syllabic Tones in Vietnamese. In: Proc. EUROSPEECH, Geneva, pp. 177–180 (2003)
Google Scholar
Le, P.N., Ambikairajah, E., Choi, E.H.C.: Improvement of Vietnamese Tone Classification using FM and MFCC Features. In: Computing and Communication Technologies RIVF 2009, pp. 01–04 (2009)
Google Scholar
Schlunz, G.I., Barnard, E., Van Huyssteen, G.B.: Part-of-speech effects on text-to-speech synthesis. In: 21st Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, November 22-23, pp. 257–262 (2010)
Google Scholar
Phan, S.T., Vu, T.T., Duong, C.T., Luong, M.C.: A study in Vietnam-ese statistical parametric speech synthesis base on HMM. IJACST 2(1), 01–06 (2013)
Google Scholar
Phan, S.T., Vu, T.T., Luong, M.C.: Extracting MFCC, F0 feature in Vietnamese HMM-based speech synthesis. International Journal of Electronics and Computer Science Engineering 2(1), 46–52 (2013)
Google Scholar
Lê, T.-H., Nguyen, A.-V., Truong, H.V., Van Bui, H., Lê, D.: A Study on Vietnamese Prosody. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 63–73. Springer, Heidelberg (2011)
Chapter Google Scholar
Vu, T.T., Luong, M.C., Nakamura, S.: An HMM-based Vietnamese Speech Synthesis System. In: Proc. Oriental COCOSDA, pp. 116–121 (2009)
Google Scholar
Doan, T.T.: Vietnamese Acoustic, Vietnamese National Editions, 2nd edn. (2003)
Google Scholar
Vu, T.T., Nguyen, D.T., Luong, M.C., Hosom, J.P.: Vietnamese large vocabulary continuous speech recognition. In: Proc. INTERSPEECH, pp. 1689–1692 (2005)
Google Scholar
Department of Computer Science, Nagoya Institute of Technology: Speech Signal Processing Toolkit, SPTK 3.6. Reference manual, Japan (December 2003), http://sourceforge.net/projects/sp-tk/ (updated December 25, 2012)

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Le Qui Don Technical University, 100 Hoang Quoc Viet Street, Cau Giay Dist., Hanoi City, Vietnam
Thanh Son Phan
Institute of Information Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Street, Cau Giay Dist., Hanoi City, Vietnam
Anh Tuan Dinh, Tat Thang Vu & Chi Mai Luong

Authors

Thanh Son Phan
View author publications
You can also search for this author in PubMed Google Scholar
Anh Tuan Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Tat Thang Vu
View author publications
You can also search for this author in PubMed Google Scholar
Chi Mai Luong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh Son Phan .

Editor information

Editors and Affiliations

School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Van Nam Huynh
UMR CNRS 7253 Heudiasyc, Universite de Technologie de Compiegne, Compiegne Cedex, France
Thierry Denoeux
Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
Dang Hung Tran
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Anh Cuong Le
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Son Bao Pham

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phan, T.S., Dinh, A.T., Vu, T.T., Luong, C.M. (2014). An Improvement of Prosodic Characteristics in Vietnamese Text to Speech System. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-02741-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02740-1
Online ISBN: 978-3-319-02741-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics