skip to main content
article

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

Published: 01 March 2005 Publication History

Abstract

The feasibility of converting text into speech using an inexpensive computer with minimal memory is of great interest. Speech synthesizers have been developed for many popular languages (e.g., English, Chinese, Spanish, French, etc.), but designing a speech synthesizer for a language is largely dependant on the language structure. In this article, we develop a Persian synthesizer that includes an innovative text analyzer module. In the synthesizer, the text is segmented into words and after preprocessing, a neural network is passed over each word. In addition to preprocessing, a new model (SEHMM) is used as a postprocessor to compensate for errors generated by the neural network. The performance of the proposed model is verified and the intelligibility of the synthetic speech is assessed via listening tests.

References

[1]
Ainsworth, W. A. 1973. A system for converting English text into speech. IEEE Trans. Audio and Electroacoustics 21 (1973), 288--290.
[2]
Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12 (1998), 119--142.
[3]
El-Imam, Y. A. 1989. An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoustic, Speech and Signal Processing 37 (1989), 1829--1845.
[4]
Embrechts, M. J. and Arciniegas, F. 2000. Neural networks for text-to-speech phoneme recognition. In Proceedings of the IEEE Systems, Man and Cybernetics Conference. IEEE Society, 2000. 3582--3587.
[5]
Lee, L.-S., Tseng, C.-Y., and Hsieh, C.-J. 1993. Improved tone concatenation rules in a formant-based Chinese text-to-speech system. IEEE Trans. Speech and Audio Processing 1 (1993), 287--294.
[6]
Moulines, E. and Charpentier, F. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990), 453--467.
[7]
Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustic, Speech and Signal Processing 25 (1977), 24--33.
[8]
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (1989), 257--286.
[9]
Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., and McGonegal, C. A. 1976. A comparative performance study of several pitch detection algorithms. IEEE Trans, Acoustic, Speech, and Signal Processing 24 (1976), 399--418.
[10]
Sejnowski, T. J. and Rosenberg, C. R. 1987. NETtalk: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987), 145--168.
[11]
Selim, H. and Anbar, T. 1986. A phonetic transcription system of Arabic text. IBM Cairo Scientific Center Tech. Rep. 25.
[12]
Sproat, R., Hu, J., and Chen, H. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, 1998. 239--244.
[13]
Wu, C.-H. and Chen, J.-H. 1997. Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion. IEEE Trans. Consumer Electronics 43 (1997), 707--716.

Cited By

View all
  • (2022)Persian speech synthesis using enhanced tacotron based on multi-resolution convolution layers and a convex optimization methodMultimedia Tools and Applications10.1007/s11042-021-11719-w81:3(3629-3645)Online publication date: 1-Jan-2022
  • (2010)Computational Intelligence in Speech and Audio Processing: Recent AdvancesSoft Computing in Industrial Applications10.1007/978-3-642-11282-9_32(303-311)Online publication date: 2010
  • (2009)A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT.2009.5407540(557-562)Online publication date: Dec-2009
  • Show More Cited By

Index Terms

  1. A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 4, Issue 1
      March 2005
      52 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/1066078
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 March 2005
      Published in TALIP Volume 4, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hidden Markov model
      2. TD-PSOLA

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Persian speech synthesis using enhanced tacotron based on multi-resolution convolution layers and a convex optimization methodMultimedia Tools and Applications10.1007/s11042-021-11719-w81:3(3629-3645)Online publication date: 1-Jan-2022
      • (2010)Computational Intelligence in Speech and Audio Processing: Recent AdvancesSoft Computing in Industrial Applications10.1007/978-3-642-11282-9_32(303-311)Online publication date: 2010
      • (2009)A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT.2009.5407540(557-562)Online publication date: Dec-2009
      • (2009)Language-independent, neural network-based, text-to-phones conversionNeurocomputing10.1016/j.neucom.2008.08.02373:1-3(87-96)Online publication date: 1-Dec-2009
      • (2009)Implementation of Three Text to Speech Systems for Kurdish LanguageProceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications10.1007/978-3-642-10268-4_38(321-328)Online publication date: 15-Nov-2009
      • (2008)Computational Intelligence in Multimedia Processing: Foundation and TrendsComputational Intelligence in Multimedia Processing: Recent Advances10.1007/978-3-540-76827-2_1(3-49)Online publication date: 2008

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media