A small-footprint context-independent HMM-based synthesizer for Tamil

Anushiya Rachel, G.; Sherlin Solomi, V.; Naveenkumar, K.; Vijayalakshmi, P.; Nagarajan, T.

doi:10.1007/s10772-015-9278-4

A small-footprint context-independent HMM-based synthesizer for Tamil

Published: 03 April 2015

Volume 18, pages 405–418, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

G. Anushiya Rachel¹,
V. Sherlin Solomi¹,
K. Naveenkumar¹,
P. Vijayalakshmi¹ &
…
T. Nagarajan¹

223 Accesses
9 Citations
Explore all metrics

Abstract

A text-to-speech synthesis system produces intelligible and natural speech corresponding to any given text. Two main attributes of a synthesizer are the quality of speech produced and the footprint size. In the current work, HMM-based speech synthesizers have been built and assessed using various kinds of phone-sized units, namely, monophone, triphone, triphone with contextual features, pentaphone, and pentaphone with contextual features. It is observed that the quality of synthetic speech improves with the addition of contexts, with a mean opinion score (MOS) of 2.4 for a synthesizer that uses monophones and 3.98 for one that uses pentaphones with 48 additional contextual features (pentaphone+). However, the footprint size also increases from 269 to 1840 kB, with the addition of contextual information. Therefore, based on a desired application, a compromise has to be made either on the quality or the footprint size. Analysis reveals that although speech synthesized by a monophone-based system lacks naturalness, it is intelligible. The lack of naturalness is primarily due to the discontinuities in the pitch contour. Therefore, an attempt is made to improve the quality of synthesized speech by smoothening the pitch contour, thereby retaining the small footprint size, while attaining quality of a synthesizer that uses contextual information. It is observed that smoothening the pitch contour at the word-level yields the best quality, with an MOS of 3.4. Further, a preference test reveals that 71.25 % of the sentences are similar in quality to the speech synthesized by a pentaphone+ HTS, while 5 % are better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Conventional and contemporary approaches used in text to speech synthesis: a review

Article 13 November 2022

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

References

Black, A., Taylor, P., & Caley, R. (1998). The festival speech synthesis system.
Cernak, M., Motlicek, P., & Garner, P. (2013). On the (un)importance of the contextual factors in HMM-based speech synthesis and coding. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8140–8143).
Drugman, T., Thomas, M., Gudnason, J., Naylor, P. A., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio Speech and Language Processing, 20, 994–1001.
Article Google Scholar
Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., & Raptis, S. (2009). Embedded unit selection text-to-speech synthesis for mobile devices. IEEE Transactions on Consumer Electronics, 55, 613–621.
Article Google Scholar
Kim, S. J., Kim, J. J., & Hahn, M. (2006). HMM-based Korean speech synthesis system for hand-held devices. IEEE Transactions on Consumer Electronics, 52, 1384–1390.
Article Google Scholar
Le Maguer, S., Barbot, N., & Boffard, O. (2013). Evaluation of contextual descriptors for HMM-based speech synthesis in French. In ISCA Speech Synthesis Workshop (SSW8) (pp. 153–158). Barcelona, Spain.
Lu, H., & King, S. (2012) Using Bayesian networks to find relevant context features for HMM-based speech synthesis. In ISCA INTERSPEECH (pp. 1–4).
Ramani B., Lilly Christina S., Anushiya Rachel G., Sherlin Solomi V., Nandwana, M. K., Prakash, A., Aswin Shanmugam, S., Krishnan, R., Prahalad, S. K., Samudravijaya, K., Vijayalakshmi, P., Nagarajan, T., & Murthy, H. (2013). A common attribute based unifed HTS framework for speech synthesis in Indian languages. In 8th ISCA Workshop on Speech Synthesis (pp. 311–316). Barcelona, Spain.
Tabet, Y., & Boughazi, M. (2011). Speech synthesis techniques. A survey (pp. 67–70). WOSSPA.
Toth, B., & Nemeth, G. (2011). Some aspects of HMM speech synthesis optimization on mobile devices. In 2nd International Conference on Cognitive Infocommunications (CogInfoCom) (pp. 1–5).
Watts, O., Yamagishi, J., & King, S. (2010). The role of higher-level linguistic features in HMM-based speech synthesis. In INTERSPEECH (pp. 841–844). ISCA.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X. A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2002). The HTK book (for HTK Version 3.4). Cambridge: Cambridge University Engineering Department.
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A. W., & Tokuda, K. (2007). The HMM-based speech synthesis system (HTS) version 2.0. In ISCA Workshop on Speech Synthesis (pp. 294–299). Bonn, Germany.
Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51, 1039–1064.
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the Department of Information Technology, Ministry of Communication and Information Technology, Government of India, for funding the project on Development of text-to-speech synthesis systems for Indian languages Phase II, Ref. no. 11(7)/2011-HCC(TDIL).

Author information

Authors and Affiliations

SSN College of Engineering, Chennai, India
G. Anushiya Rachel, V. Sherlin Solomi, K. Naveenkumar, P. Vijayalakshmi & T. Nagarajan

Authors

G. Anushiya Rachel
View author publications
You can also search for this author in PubMed Google Scholar
V. Sherlin Solomi
View author publications
You can also search for this author in PubMed Google Scholar
K. Naveenkumar
View author publications
You can also search for this author in PubMed Google Scholar
P. Vijayalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
T. Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Anushiya Rachel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anushiya Rachel, G., Sherlin Solomi, V., Naveenkumar, K. et al. A small-footprint context-independent HMM-based synthesizer for Tamil. Int J Speech Technol 18, 405–418 (2015). https://doi.org/10.1007/s10772-015-9278-4

Download citation

Received: 01 December 2014
Accepted: 23 March 2015
Published: 03 April 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10772-015-9278-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A small-footprint context-independent HMM-based synthesizer for Tamil

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

Chinese dialect speech recognition: a comprehensive survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A small-footprint context-independent HMM-based synthesizer for Tamil

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

Chinese dialect speech recognition: a comprehensive survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation