Skip to main content
Log in

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

An Erratum to this article was published on 27 June 2016

Abstract

Arabic language is one of six United Nations official languages. Arabic language processing, in particular speech synthesis, is a challenging task due to the inherent complexity of the language text and characters and because each letter may have up to seven different sounds. In this paper, we provide subjective and objective evaluation for six different speech synthesizer applications available on the Internet for Arabic language namely: Acapela, ISpeech, Arabi, Sakhr, Google, and Nuance. In the case of subjective evaluation the authors performed four intelligibility tests: Diagnostic Rhyme, Modified Rhyme, Phonetically Confusable Sentences. The fourth test is proposed by the authors, Automatic Diacritization Intelligibility (ADI) which is used to test the intelligibility of the speech engine in predicting the diacritization mark according to the word context in the statement. Another two tests were performed to evaluate other features of the speech engines are: first, Arabic Text with All Sounds (ATAS) test which is used to evaluate different features when the speech engine reads Arabic text that contains all sounds for different Arabic letters. Second, Best/Worst Pleasant Voice this test is proposed by the authors to determine the best and worst speech engine in terms of the voice pleasantness. The other type of evaluation conducted is objective evaluation we evaluate the output of the six systems objectively and compare the results with the subjective evaluations performed. Such comparison is achieved by computing some objective metrics from the signals of both the generated sound by the systems and a reference one (i.e., the same text is spoken by a human). Two types of measurements are used as the objective metrics; signal to noise variation (segmented SNR) and a linear predictive (LP-based) measure. The originality of the evaluation is that it is based on using an Arabic text (diacritized and non-diacritized) containing all sounds of Arabic letters. Another novelty is that we introduced two tests ADI and ATAS tests for Arabic speech synthesizers evaluation. The result from subject users are provided to measure clearness/naturalness, speed, sound quality, pronunciation, clearness, stress/intonation, pronunciation errors, intelligibility, and pleasantness. In addition, results from experts are presented to measure the articulation of each sound, number of not pronounced words, and the speed of reading. The obtained results reveal the need to have speech synthesizers for Arabic language that considers diacritization to enhance the performance of the system. It points also to the importance of having an accurate automatic diacritization system that generates a diacritized text to be synthesized. The results show the significance of having a human similar voice for the speech synthesizer. We proposed a set of recommendations for improving Arabic speech synthesizers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  • Abdel-Hamid, O., Abdou, S. M., & Rashwan, M. (2006). Improving arabic hmm based speech synthesis quality. INTERSPEECH.

  • Acapela speech synthesizer. (2014). World Wide Web electronic publication. http://www.acapela-group.com/text-to-speech-interactive-demo.html.

  • Ahmad, J. (2007). Optical character recognition system for arabic text using cursive multi-directional approach. Journal of Computer Science, 3, 549–555.

    Article  Google Scholar 

  • Ali, M. E. M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2007). Automatic segmentation of arabic speech. In Workshop on information technology and islamic sciences, Imam Mohammad Ben Saud University, Riyadh, March.

  • AlKhateeb, J., H. Ren, J., Ipson, S., & Jiang, J. (2008). knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten arabic text. In Fifth international conference on information technology: New generations (pp. 1158–1159).

  • Al-Saud, N. B., & Al-Khalifa, H. S. (2012). An initial comparative study of arabic speech synthesis engines in ios and android: Proceedings of the 14th international conference on information integration and web-based applications & services, IIWAS ’12 (pp. 411–414). New York, NY: ACM.

  • Al-Wabil, A., Al-Khalifa, H., & Al-Saleh, W. (2007). Arabic text-to-speech synthesis: A preliminary evaluation. In C. Montgomerie & J. Seale (Eds.), Proceedings of world conference on educational multimedia, hypermedia and telecommunications 2007 (pp. 4423–4430). Vancouver: AACE.

  • Alyazeed, M. A., Al-Ghoneimy, M. R., & Mohammad, M. (1989). Comparison of syllable and sub-syllable methods for speech synthesis. In Proceedings of the second conference on arabic computational linguistics, Kuwait.

  • Arabi, automatic arabic text to speech system. (2014). World Wide Web electronic publication. http://www.arabinlp.com/Systems/Demo_SystemsTTS.php?pageLang=en.

  • Assaf, M. (2005). A prototype of an arabic diphone speech synthesizer in festival. Master’s thesis, Uppsala University.

  • Atallah, A. S., & Omar, K. (2008). Methods of arabic language baseline detection the state of art. International Journal of Computer Science and Network Security (IJCSNS), 8, 137–143.

    Google Scholar 

  • Bennett, C. L. (2005). Large scale evaluation of corpus-based synthesizers:results and lessons from the blizzard challenge 2005. In Proceedings of interspeech 2005, Lisbon.

  • Black, A. W., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proceedings of interspeech 2005 (pp. 77–80). Lisbon.

  • Borovikov, E., & Zavorin, I. (2012). A multi-stage approach to arabic document analysis. In V. Margner & H. El Abed (Eds.), Guide to OCR for Arabic scripts (pp. 55–78). London: Springer.

    Chapter  Google Scholar 

  • Campbell, N. (2007). Evaluation of speech synthesis. In L. Dybkjaer & H. Minker (Eds.), Evaluation of text and speech systems. From reading machines to talking machines. Dordrecht: Springer.

    Google Scholar 

  • Chabchoub, A., & Cherif, A. (2011). An automatic mbrola tool for high quality arabic speech synthesis. International Journal of Computer Applications, 36(1):1–5. Published by Foundation of Computer Science, New York, USA.

  • Clark, R. A. J., Podsiadso, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the blizzard challenge 2007 listening test results. In Proceedings of blizzard workshop (in Proc. SSW6), Bonn.

  • Damper, R., Marchand, Y., Adamson, M., & Gustafson, K. (1999). Evaluating the pronunciation component of text-to-speech systems for english: A performance comparison of different approaches. Computer Speech and Language, 13(2), 155–176.

    Article  Google Scholar 

  • Dutoit, T., Pagel, V., Pierret, N., Bataille, F., & Van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceedings of fourth international conference on spoken language. ICSLP 96 (vol. 3, pp. 1393–1396).

  • El-Imam, Y. (1989). An unrestricted vocabulary arabic speech synthesis system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1829–1845.

    Article  Google Scholar 

  • Elshafei, M. (1991). Toward an arabic text-to-speech system. Arabian Journal for Science and Engineering, 16(4B), 565–583.

    MathSciNet  Google Scholar 

  • Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality arabic speech synthesis. Information Sciences, 140(34), 255–267.

    Article  MATH  Google Scholar 

  • Fraser, M., & King, S. (2007). The blizzard challenge 2007. In Proceedings blizzard workshop (in Proc. SSW6), Bonn.

  • Google translate. (2014). World Wide Web electronic publication. http://translate.google.com/.

  • Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In The 2011 IEEE student conference on research and development (SCOReD) (pp. 409–414). IEEE.

  • Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. ICSLP, 7, 2819–2822. (Citeseer).

  • Hirst, D., & Cristo, A. D. (1998). Intonation systems: A survey of twenty languages (1st ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  • Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (vol. 1, pp. 293–296). IEEE.

  • Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96 (vol. 1, pp. 373–376). IEEE.

  • Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal (SPIJ), 6(5), 140.

    Google Scholar 

  • Jayousi, A. Q. M. A. (2007). Arabic text-to-speech synthesizer.

  • Khalifa, O., Obaid, M., Naji, A., & Daoud, J. I. (2011). A rule-based arabic text-to-speech system based on hybrid synthesis technique. Australian Journal of Basic and Applied Sciences, 5(6), 342–354.

    Google Scholar 

  • Khalil, K., & Adnan, C. (2013). Arabic hmm-based speech synthesis. In International conference on electrical engineering and software applications (ICEESA), 2013 (pp. 1–5).

  • Klatt, D. H. (1987). Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3), 737–793.

    Article  Google Scholar 

  • Kondo, K. (2012). Subjective quality measurement of speech. Berlin: Springer.

    Book  Google Scholar 

  • Leila, C., Maamar, K., & Salim, C. (2011). Combining neural networks for arabic handwriting recognition. In 10th international symposium on programming and systems (ISPS), 2011 (pp. 74–79).

  • Liana, M., & Venu, G. (2006). Offline arabic handwriting recognition: A survey. IEEE, Transactions on Pattern Analysis and Machine Intelligence, 28, 712–724.

    Article  Google Scholar 

  • Nuance vocalizer. (2014). World Wide Web electronic publication. http://enterprisecontent.nuance.com/vocalizer5-network-demo/index.html.

  • Rashad, M. Z., El-Bakry, H. M., & Isma’il, I. R. (2010). Diphone speech synthesis system for arabic using mary tts. International Journal of Computer Science and Information Technology (IJCSIT), 2(4), 18–26.

    Article  Google Scholar 

  • Rashwan, M. A., Fakhr, M. W., Attia, M., & El-Mahallawy, M. S. (2007). Arabic ocr system analogous to hmm-based asr systems implementation and evaluation. Journal of Engineering and Applied Science (JEAS), 54(6), 653.

    Google Scholar 

  • Sakhr speech synthesizer. (2014). World Wide Web electronic publication. http://www.sakhr.com/tts/TTS_Demo.aspx.

  • Schrder, M., & Trouvain, J. (2003). The german text-to-speech synthesis system mary: A tool for research, development and teaching. International Journal of Speech Technology, 6(4), 365–377.

    Article  Google Scholar 

  • Shaker, N., Abou-Zleikha, M., & Al Dakkak, O. (2008). Ssml for arabic language. In Text, Speech and Dialogue, pp. 657–664. Springer.

  • Sluijter, A., Bosgoed, E., Kerkhoff, J., Meier, E., Rietveld, T., & Swerts, M., et al. (1998). Evaluation of speech synthesis systems for dutch in telecommunication applications. Jenolan Caves: In Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis.

  • Speechworks solution division from ScanSoft, Peabody, MA (2004). White paper—Assessing text-to-speech system quality. Technical report.

  • Ssml. (2005). Ssml 1.0 say-as attribute values. Working group note 26 may, W3C.

  • Text to speech by ispeech. (2014). World Wide Web electronic publication. http://www.ispeech.org/text.to.speech.

  • Tratz, S. C. (2014). Accurate arabic script language/dialect classification. DTIC Document: Technical report.

  • Youssef, A., & Emam, O. (2004). An arabic tts system based on the ibm trainable speech synthesizer. JEP-TALN: Le traitement automatique de l’arabe.

  • Zeki, A. (2005). The segmentation problem on arabic character recognition the state of the art. 1st international conference on information and communication technology (ICICT) (pp. 48–57). Pakistan: Karachi.

Download references

Acknowledgments

This research project is funded by the Jordanian Scientific Research Support Fund No. EIT/1/05/2011. Thanks to Prof. Sameer Istetiah from the Arabic department in Yarmouk University, a well-known expert in the Arabic Language who provide us with this text.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iyad Abu Doush.

Appendix: Questionaire for testing speech synthesizer application

Appendix: Questionaire for testing speech synthesizer application

Clearness/naturalness

  • Q1) Is the voice nice listening to?

    1. 1.

      Very natural

    2. 2.

      Natural

    3. 3.

      Ok

    4. 4.

      Unnatural

    5. 5.

      Very unnatural

Speed

  • Q2) Does the system speak adequate fast?

    1. 1.

      Too much fast

    2. 2.

      Too fast

    3. 3.

      fast/normal

    4. 4.

      Too slow

    5. 5.

      Too much slow

Sound quality

  • Q3) Does you consider the system has a good sound quality?

    1. 1.

      Very bad

    2. 2.

      Bad

    3. 3.

      Neutral

    4. 4.

      Good

    5. 5.

      Very good

Pronunciation

  • Q4) Was it very easy to grab/get some of the words?

    1. 1.

      Very hard

    2. 2.

      Hard

    3. 3.

      Neutral

    4. 4.

      Easy

    5. 5.

      Very easy

  • Q5) Did you have to concentrate a lot to grab/get the speech told by the voice?

    1. 1.

      Needs a lot of attention

    2. 2.

      Some attention at some words

    3. 3.

      Normal attention

    4. 4.

      Little attention

    5. 5.

      No attention was needed

  • Q6) How did you find the pronunciation?

    1. 1.

      Too much annoying

    2. 2.

      Very annoying

    3. 3.

      Annoying

    4. 4.

      Little annoying

    5. 5.

      No annoying

Clearness

  • Q7) How much the voice is clear?

    1. 1.

      Very little

    2. 2.

      Little

    3. 3.

      Neutral

    4. 4.

      Much

    5. 5.

      Very much

  • Q8) Was the voice easy to grab/get?

    1. 1.

      Very hard

    2. 2.

      Hard

    3. 3.

      Neutral

    4. 4.

      Easy

    5. 5.

      Very easy

Stress/intonation

  • Q9) What do you think of the intonation of the voice?

    1. 1.

      Very bad

    2. 2.

      Bad

    3. 3.

      Neutral

    4. 4.

      Good

    5. 5.

      Very good

  • Q10) How did you find the stress?

    1. 1.

      Too much annoying

    2. 2.

      Very annoying

    3. 3.

      Annoying

    4. 4.

      Little annoying

    5. 5.

      No annoying

Finding error

  • Q11) Does the system make many pronunciation mistakes?

    1. 1.

      Too many

    2. 2.

      Many

    3. 3.

      Neutral

    4. 4.

      Few

    5. 5.

      Too few

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abu Doush, I., Alkhatib, F. & Bsoul, A.A.R. What we have and what is needed, how to evaluate Arabic Speech Synthesizer?. Int J Speech Technol 19, 415–432 (2016). https://doi.org/10.1007/s10772-015-9304-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9304-6

Keywords

Navigation