What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

Abu Doush, Iyad; Alkhatib, Faisal; Bsoul, Abed Al Raoof

doi:10.1007/s10772-015-9304-6

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

Published: 06 April 2016

Volume 19, pages 415–432, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Iyad Abu Doush¹,
Faisal Alkhatib¹ &
Abed Al Raoof Bsoul¹

422 Accesses
2 Citations
Explore all metrics

An Erratum to this article was published on 27 June 2016

Abstract

Arabic language is one of six United Nations official languages. Arabic language processing, in particular speech synthesis, is a challenging task due to the inherent complexity of the language text and characters and because each letter may have up to seven different sounds. In this paper, we provide subjective and objective evaluation for six different speech synthesizer applications available on the Internet for Arabic language namely: Acapela, ISpeech, Arabi, Sakhr, Google, and Nuance. In the case of subjective evaluation the authors performed four intelligibility tests: Diagnostic Rhyme, Modified Rhyme, Phonetically Confusable Sentences. The fourth test is proposed by the authors, Automatic Diacritization Intelligibility (ADI) which is used to test the intelligibility of the speech engine in predicting the diacritization mark according to the word context in the statement. Another two tests were performed to evaluate other features of the speech engines are: first, Arabic Text with All Sounds (ATAS) test which is used to evaluate different features when the speech engine reads Arabic text that contains all sounds for different Arabic letters. Second, Best/Worst Pleasant Voice this test is proposed by the authors to determine the best and worst speech engine in terms of the voice pleasantness. The other type of evaluation conducted is objective evaluation we evaluate the output of the six systems objectively and compare the results with the subjective evaluations performed. Such comparison is achieved by computing some objective metrics from the signals of both the generated sound by the systems and a reference one (i.e., the same text is spoken by a human). Two types of measurements are used as the objective metrics; signal to noise variation (segmented SNR) and a linear predictive (LP-based) measure. The originality of the evaluation is that it is based on using an Arabic text (diacritized and non-diacritized) containing all sounds of Arabic letters. Another novelty is that we introduced two tests ADI and ATAS tests for Arabic speech synthesizers evaluation. The result from subject users are provided to measure clearness/naturalness, speed, sound quality, pronunciation, clearness, stress/intonation, pronunciation errors, intelligibility, and pleasantness. In addition, results from experts are presented to measure the articulation of each sound, number of not pronounced words, and the speed of reading. The obtained results reveal the need to have speech synthesizers for Arabic language that considers diacritization to enhance the performance of the system. It points also to the importance of having an accurate automatic diacritization system that generates a diacritized text to be synthesized. The results show the significance of having a human similar voice for the speech synthesizer. We proposed a set of recommendations for improving Arabic speech synthesizers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

Article 11 January 2021

References

Abdel-Hamid, O., Abdou, S. M., & Rashwan, M. (2006). Improving arabic hmm based speech synthesis quality. INTERSPEECH.
Acapela speech synthesizer. (2014). World Wide Web electronic publication. http://www.acapela-group.com/text-to-speech-interactive-demo.html.
Ahmad, J. (2007). Optical character recognition system for arabic text using cursive multi-directional approach. Journal of Computer Science, 3, 549–555.
Article Google Scholar
Ali, M. E. M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2007). Automatic segmentation of arabic speech. In Workshop on information technology and islamic sciences, Imam Mohammad Ben Saud University, Riyadh, March.
AlKhateeb, J., H. Ren, J., Ipson, S., & Jiang, J. (2008). knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten arabic text. In Fifth international conference on information technology: New generations (pp. 1158–1159).
Al-Saud, N. B., & Al-Khalifa, H. S. (2012). An initial comparative study of arabic speech synthesis engines in ios and android: Proceedings of the 14th international conference on information integration and web-based applications & services, IIWAS ’12 (pp. 411–414). New York, NY: ACM.
Al-Wabil, A., Al-Khalifa, H., & Al-Saleh, W. (2007). Arabic text-to-speech synthesis: A preliminary evaluation. In C. Montgomerie & J. Seale (Eds.), Proceedings of world conference on educational multimedia, hypermedia and telecommunications 2007 (pp. 4423–4430). Vancouver: AACE.
Alyazeed, M. A., Al-Ghoneimy, M. R., & Mohammad, M. (1989). Comparison of syllable and sub-syllable methods for speech synthesis. In Proceedings of the second conference on arabic computational linguistics, Kuwait.
Arabi, automatic arabic text to speech system. (2014). World Wide Web electronic publication. http://www.arabinlp.com/Systems/Demo_SystemsTTS.php?pageLang=en.
Assaf, M. (2005). A prototype of an arabic diphone speech synthesizer in festival. Master’s thesis, Uppsala University.
Atallah, A. S., & Omar, K. (2008). Methods of arabic language baseline detection the state of art. International Journal of Computer Science and Network Security (IJCSNS), 8, 137–143.
Google Scholar
Bennett, C. L. (2005). Large scale evaluation of corpus-based synthesizers:results and lessons from the blizzard challenge 2005. In Proceedings of interspeech 2005, Lisbon.
Black, A. W., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proceedings of interspeech 2005 (pp. 77–80). Lisbon.
Borovikov, E., & Zavorin, I. (2012). A multi-stage approach to arabic document analysis. In V. Margner & H. El Abed (Eds.), Guide to OCR for Arabic scripts (pp. 55–78). London: Springer.
Chapter Google Scholar
Campbell, N. (2007). Evaluation of speech synthesis. In L. Dybkjaer & H. Minker (Eds.), Evaluation of text and speech systems. From reading machines to talking machines. Dordrecht: Springer.
Google Scholar
Chabchoub, A., & Cherif, A. (2011). An automatic mbrola tool for high quality arabic speech synthesis. International Journal of Computer Applications, 36(1):1–5. Published by Foundation of Computer Science, New York, USA.
Clark, R. A. J., Podsiadso, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the blizzard challenge 2007 listening test results. In Proceedings of blizzard workshop (in Proc. SSW6), Bonn.
Damper, R., Marchand, Y., Adamson, M., & Gustafson, K. (1999). Evaluating the pronunciation component of text-to-speech systems for english: A performance comparison of different approaches. Computer Speech and Language, 13(2), 155–176.
Article Google Scholar
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., & Van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceedings of fourth international conference on spoken language. ICSLP 96 (vol. 3, pp. 1393–1396).
El-Imam, Y. (1989). An unrestricted vocabulary arabic speech synthesis system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1829–1845.
Article Google Scholar
Elshafei, M. (1991). Toward an arabic text-to-speech system. Arabian Journal for Science and Engineering, 16(4B), 565–583.
MathSciNet Google Scholar
Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality arabic speech synthesis. Information Sciences, 140(34), 255–267.
Article MATH Google Scholar
Fraser, M., & King, S. (2007). The blizzard challenge 2007. In Proceedings blizzard workshop (in Proc. SSW6), Bonn.
Google translate. (2014). World Wide Web electronic publication. http://translate.google.com/.
Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In The 2011 IEEE student conference on research and development (SCOReD) (pp. 409–414). IEEE.
Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. ICSLP, 7, 2819–2822. (Citeseer).
Hirst, D., & Cristo, A. D. (1998). Intonation systems: A survey of twenty languages (1st ed.). Cambridge: Cambridge University Press.
Google Scholar
Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (vol. 1, pp. 293–296). IEEE.
Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96 (vol. 1, pp. 373–376). IEEE.
Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal (SPIJ), 6(5), 140.
Google Scholar
Jayousi, A. Q. M. A. (2007). Arabic text-to-speech synthesizer.
Khalifa, O., Obaid, M., Naji, A., & Daoud, J. I. (2011). A rule-based arabic text-to-speech system based on hybrid synthesis technique. Australian Journal of Basic and Applied Sciences, 5(6), 342–354.
Google Scholar
Khalil, K., & Adnan, C. (2013). Arabic hmm-based speech synthesis. In International conference on electrical engineering and software applications (ICEESA), 2013 (pp. 1–5).
Klatt, D. H. (1987). Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3), 737–793.
Article Google Scholar
Kondo, K. (2012). Subjective quality measurement of speech. Berlin: Springer.
Book Google Scholar
Leila, C., Maamar, K., & Salim, C. (2011). Combining neural networks for arabic handwriting recognition. In 10th international symposium on programming and systems (ISPS), 2011 (pp. 74–79).
Liana, M., & Venu, G. (2006). Offline arabic handwriting recognition: A survey. IEEE, Transactions on Pattern Analysis and Machine Intelligence, 28, 712–724.
Article Google Scholar
Nuance vocalizer. (2014). World Wide Web electronic publication. http://enterprisecontent.nuance.com/vocalizer5-network-demo/index.html.
Rashad, M. Z., El-Bakry, H. M., & Isma’il, I. R. (2010). Diphone speech synthesis system for arabic using mary tts. International Journal of Computer Science and Information Technology (IJCSIT), 2(4), 18–26.
Article Google Scholar
Rashwan, M. A., Fakhr, M. W., Attia, M., & El-Mahallawy, M. S. (2007). Arabic ocr system analogous to hmm-based asr systems implementation and evaluation. Journal of Engineering and Applied Science (JEAS), 54(6), 653.
Google Scholar
Sakhr speech synthesizer. (2014). World Wide Web electronic publication. http://www.sakhr.com/tts/TTS_Demo.aspx.
Schrder, M., & Trouvain, J. (2003). The german text-to-speech synthesis system mary: A tool for research, development and teaching. International Journal of Speech Technology, 6(4), 365–377.
Article Google Scholar
Shaker, N., Abou-Zleikha, M., & Al Dakkak, O. (2008). Ssml for arabic language. In Text, Speech and Dialogue, pp. 657–664. Springer.
Sluijter, A., Bosgoed, E., Kerkhoff, J., Meier, E., Rietveld, T., & Swerts, M., et al. (1998). Evaluation of speech synthesis systems for dutch in telecommunication applications. Jenolan Caves: In Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis.
Speechworks solution division from ScanSoft, Peabody, MA (2004). White paper—Assessing text-to-speech system quality. Technical report.
Ssml. (2005). Ssml 1.0 say-as attribute values. Working group note 26 may, W3C.
Text to speech by ispeech. (2014). World Wide Web electronic publication. http://www.ispeech.org/text.to.speech.
Tratz, S. C. (2014). Accurate arabic script language/dialect classification. DTIC Document: Technical report.
Youssef, A., & Emam, O. (2004). An arabic tts system based on the ibm trainable speech synthesizer. JEP-TALN: Le traitement automatique de l’arabe.
Zeki, A. (2005). The segmentation problem on arabic character recognition the state of the art. 1st international conference on information and communication technology (ICICT) (pp. 48–57). Pakistan: Karachi.

Download references

Acknowledgments

This research project is funded by the Jordanian Scientific Research Support Fund No. EIT/1/05/2011. Thanks to Prof. Sameer Istetiah from the Arabic department in Yarmouk University, a well-known expert in the Arabic Language who provide us with this text.

Author information

Authors and Affiliations

Department of Computer Science, Yarmouk University, Irbid, Jordan
Iyad Abu Doush, Faisal Alkhatib & Abed Al Raoof Bsoul

Authors

Iyad Abu Doush
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Alkhatib
View author publications
You can also search for this author in PubMed Google Scholar
Abed Al Raoof Bsoul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iyad Abu Doush.

Appendix: Questionaire for testing speech synthesizer application

Clearness/naturalness

Q1) Is the voice nice listening to?
1. 1.
  Very natural
2. 2.
  Natural
3. 3.
  Ok
4. 4.
  Unnatural
5. 5.
  Very unnatural

Speed

Q2) Does the system speak adequate fast?
1. 1.
  Too much fast
2. 2.
  Too fast
3. 3.
  fast/normal
4. 4.
  Too slow
5. 5.
  Too much slow

Sound quality

Q3) Does you consider the system has a good sound quality?
1. 1.
  Very bad
2. 2.
  Bad
3. 3.
  Neutral
4. 4.
  Good
5. 5.
  Very good

Pronunciation

Q4) Was it very easy to grab/get some of the words?
1. 1.
  Very hard
2. 2.
  Hard
3. 3.
  Neutral
4. 4.
  Easy
5. 5.
  Very easy

Q5) Did you have to concentrate a lot to grab/get the speech told by the voice?
1. 1.
  Needs a lot of attention
2. 2.
  Some attention at some words
3. 3.
  Normal attention
4. 4.
  Little attention
5. 5.
  No attention was needed

Q6) How did you find the pronunciation?
1. 1.
  Too much annoying
2. 2.
  Very annoying
3. 3.
  Annoying
4. 4.
  Little annoying
5. 5.
  No annoying

Clearness

Q7) How much the voice is clear?
1. 1.
  Very little
2. 2.
  Little
3. 3.
  Neutral
4. 4.
  Much
5. 5.
  Very much

Q8) Was the voice easy to grab/get?
1. 1.
  Very hard
2. 2.
  Hard
3. 3.
  Neutral
4. 4.
  Easy
5. 5.
  Very easy

Stress/intonation

Q9) What do you think of the intonation of the voice?
1. 1.
  Very bad
2. 2.
  Bad
3. 3.
  Neutral
4. 4.
  Good
5. 5.
  Very good

Q10) How did you find the stress?
1. 1.
  Too much annoying
2. 2.
  Very annoying
3. 3.
  Annoying
4. 4.
  Little annoying
5. 5.
  No annoying

Finding error

Q11) Does the system make many pronunciation mistakes?
1. 1.
  Too many
2. 2.
  Many
3. 3.
  Neutral
4. 4.
  Few
5. 5.
  Too few

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abu Doush, I., Alkhatib, F. & Bsoul, A.A.R. What we have and what is needed, how to evaluate Arabic Speech Synthesizer?. Int J Speech Technol 19, 415–432 (2016). https://doi.org/10.1007/s10772-015-9304-6

Download citation

Received: 20 April 2015
Accepted: 12 August 2015
Published: 06 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10772-015-9304-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Questionaire for testing speech synthesizer application

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Questionaire for testing speech synthesizer application

Appendix: Questionaire for testing speech synthesizer application

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation