article

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

Authors:

F. Hendessi,

A. Ghayoori,

T. A. GulliverAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 4, Issue 1

Pages 38 - 52

https://doi.org/10.1145/1066078.1066081

Published: 01 March 2005 Publication History

Get Access

Abstract

The feasibility of converting text into speech using an inexpensive computer with minimal memory is of great interest. Speech synthesizers have been developed for many popular languages (e.g., English, Chinese, Spanish, French, etc.), but designing a speech synthesizer for a language is largely dependant on the language structure. In this article, we develop a Persian synthesizer that includes an innovative text analyzer module. In the synthesizer, the text is segmented into words and after preprocessing, a neural network is passed over each word. In addition to preprocessing, a new model (SEHMM) is used as a postprocessor to compensate for errors generated by the neural network. The performance of the proposed model is verified and the intelligibility of the synthetic speech is assessed via listening tests.

References

[1]

Ainsworth, W. A. 1973. A system for converting English text into speech. IEEE Trans. Audio and Electroacoustics 21 (1973), 288--290.

Google Scholar

[2]

Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12 (1998), 119--142.

Google Scholar

[3]

El-Imam, Y. A. 1989. An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoustic, Speech and Signal Processing 37 (1989), 1829--1845.

Google Scholar

[4]

Embrechts, M. J. and Arciniegas, F. 2000. Neural networks for text-to-speech phoneme recognition. In Proceedings of the IEEE Systems, Man and Cybernetics Conference. IEEE Society, 2000. 3582--3587.

Google Scholar

[5]

Lee, L.-S., Tseng, C.-Y., and Hsieh, C.-J. 1993. Improved tone concatenation rules in a formant-based Chinese text-to-speech system. IEEE Trans. Speech and Audio Processing 1 (1993), 287--294.

Google Scholar

[6]

Moulines, E. and Charpentier, F. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990), 453--467.

Crossref

Google Scholar

[7]

Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustic, Speech and Signal Processing 25 (1977), 24--33.

Google Scholar

[8]

Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (1989), 257--286.

Google Scholar

[9]

Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., and McGonegal, C. A. 1976. A comparative performance study of several pitch detection algorithms. IEEE Trans, Acoustic, Speech, and Signal Processing 24 (1976), 399--418.

Google Scholar

[10]

Sejnowski, T. J. and Rosenberg, C. R. 1987. NETtalk: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987), 145--168.

Google Scholar

[11]

Selim, H. and Anbar, T. 1986. A phonetic transcription system of Arabic text. IBM Cairo Scientific Center Tech. Rep. 25.

Google Scholar

[12]

Sproat, R., Hu, J., and Chen, H. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, 1998. 239--244.

Google Scholar

[13]

Wu, C.-H. and Chen, J.-H. 1997. Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion. IEEE Trans. Consumer Electronics 43 (1997), 707--716.

Crossref

Google Scholar

Cited By

View all

Naderi NNasersharif BNikoofard A(2022)Persian speech synthesis using enhanced tacotron based on multi-resolution convolution layers and a convex optimization methodMultimedia Tools and Applications10.1007/s11042-021-11719-w81:3(3629-3645)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s11042-021-11719-w
Hassanien ASchaefer GDarwish A(2010)Computational Intelligence in Speech and Audio Processing: Recent AdvancesSoft Computing in Industrial Applications10.1007/978-3-642-11282-9_32(303-311)Online publication date: 2010
https://doi.org/10.1007/978-3-642-11282-9_32
Barkhoda WZahirAzami BBahrampour AShahryari O(2009)A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT.2009.5407540(557-562)Online publication date: Dec-2009
https://doi.org/10.1109/ISSPIT.2009.5407540
Show More Cited By

Index Terms

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A small-footprint context-independent HMM-based synthesizer for Tamil

A text-to-speech synthesis system produces intelligible and natural speech corresponding to any given text. Two main attributes of a synthesizer are the quality of speech produced and the footprint size. In the current work, HMM-based speech ...
Speaker independent Urdu speech recognition using HMM
NLDB'10: Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems

Automatic Speech Recognition (ASR) is one of the advanced fields of Natural Language Processing (NLP). Recent past has witnessed valuable research activities in ASR in English, European and East Asian languages. But unfortunately South Asian Languages ...
Voice comparison between smokers and non-smokers using HMM speech recognition system

Automatic speech recognition is a technology that allows a computer to transcribe in real time spoken words into readable text. In this work an HMM automatic speech recognition system was created to detect smoker speaker. This research project is ...

Comments

Information & Contributors

Information

Published In

ACM Transactions on Asian Language Information Processing Volume 4, Issue 1

March 2005

52 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/1066078

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2005

Published in TALIP Volume 4, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
907
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Naderi NNasersharif BNikoofard A(2022)Persian speech synthesis using enhanced tacotron based on multi-resolution convolution layers and a convex optimization methodMultimedia Tools and Applications10.1007/s11042-021-11719-w81:3(3629-3645)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s11042-021-11719-w
Hassanien ASchaefer GDarwish A(2010)Computational Intelligence in Speech and Audio Processing: Recent AdvancesSoft Computing in Industrial Applications10.1007/978-3-642-11282-9_32(303-311)Online publication date: 2010
https://doi.org/10.1007/978-3-642-11282-9_32
Barkhoda WZahirAzami BBahrampour AShahryari O(2009)A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT.2009.5407540(557-562)Online publication date: Dec-2009
https://doi.org/10.1109/ISSPIT.2009.5407540
Malcangi MFrontini D(2009)Language-independent, neural network-based, text-to-phones conversionNeurocomputing10.1016/j.neucom.2008.08.02373:1-3(87-96)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.1016/j.neucom.2008.08.023
Bahrampour ABarkhoda WAzami B(2009)Implementation of Three Text to Speech Systems for Kurdish LanguageProceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications10.1007/978-3-642-10268-4_38(321-328)Online publication date: 15-Nov-2009
https://dl.acm.org/doi/10.1007/978-3-642-10268-4_38
Hassanien AAbraham AKacprzyk JPeters J(2008)Computational Intelligence in Multimedia Processing: Foundation and TrendsComputational Intelligence in Multimedia Processing: Recent Advances10.1007/978-3-540-76827-2_1(3-49)Online publication date: 2008
https://doi.org/10.1007/978-3-540-76827-2_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A small-footprint context-independent HMM-based synthesizer for Tamil

Speaker independent Urdu speech recognition using HMM

Voice comparison between smokers and non-smokers using HMM speech recognition system

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations