Skip to main content

Advertisement

Log in

Balanced Arabic corpus design for speech synthesis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper aims to design and validate a phonetically balanced speech corpus for Arabic language. Designing and developing a rich and phonetically balanced corpus in optimal context is one of the key issues in building high quality of text-to-speech synthesis systems. The rich characteristic is in the sense that it must contain all the possible phonemes on the right and left context, whereas the balanced characteristic is in the sense that it respects the phonetic distribution in the language. We propose a new methodology for designing and implementing such corpus for speech synthesis purposes. The paper explains the whole creation process of this corpus, beginning with the design stage, corpus creation, recording phases, and finally the segmentation of the speech corpus. The speech corpus contains 202 sentences with 6174 phonemes. In order to validate the speech corpus, an Arabic speech synthesis system using Hidden Markov Models has been developed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdo, O., Abdou, S., & Fashal, M. (2017). Building audio-visual phonetically annotated Arabic corpus for expressive text to speech. Interspeech 2017, Stockholm.

  • Abed, A., Amrouche, A., & Boubakeur, K. N. (2017). Investigation of HTK for Arabic phonemes boundary detection. International Conference on Engineering Research and Applications (ICERA-17), pp. 17–18

  • Abed, A., Amrouche, A., Delmadji, A., & Boubakeur, K. N. (2016) Segmentation Automatique des Signaux Sonores par HMM et RNA pour la langue Arabe. Conférence Internationale en Sciences et Technologies Electriques au Maghreb CISTEM“2016, Marrakech.

  • Abed, A., & Guerti, M. (2016). HMM/GMM classification for articulation disorder correction among Algerian children. The International Arab Journal of Information Technology, 13(4).

  • Abushariah, M., Ainon, R., Roziati, Z., Elshafei, M., & Khalifa, O. (2012a). Arabic speaker independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. The International Arab Journal of Information Technology, 9(1), 84–93.

  • Abushariah, M., Ainon, R., Roziati, Z., Elshafei, M., & Khalifa, O. (2012b). Phonetically rich and balanced text and speech corpora for Arabic language, Lang Resources and Evaluation, Springer, pp. 601–634, 46.

  • Alghamdi, M., Alhamid, A. H., & Aldasuqi, M. M. (2003). Database of Arabic sounds: Sentences. Technical Report, Saudi Arabia: King Abdulaziz City of Science and Technology (in Arabic).

  • Almosallam, I., Alkhalifa, A., Alghamdi, M., Alkanhal, M., & Alkhairy A. (2013). SASSC: A standard Arabic single speaker corpus. In 8th ISCA Speech Synthesis Workshop.

  • Alsulaiman, M. M., Ghulam, M., Bencherif, M. A, Awais, M., Zulfiqar, A., & Aljabri, M. (2011). Building a rich Arabic speech database. In 5th Asia International Conference on Mathematical Modelling and Computer Simulation.

  • Amrouche, A., Abed, A., & Boubakeur, K.N. (2017b). New method for stemming of Arabic language text. International Conference on Engineering Research and Applications (ICERA-17), pp. 17–18.

  • Amrouche, A., Falek, L., Teffahi. (2014). Contribution à l’amélioration du signal de synthèse dans un système TTS pour la langue arabe. Fifth International Conference on Arabic Language Processing (CITAL2014), Oujda, Morocco.

  • Amrouche, A., Falek, L., & Teffahi, H. (2015). Text-to-speech synthesis system for the Arabic language. In International Conference on Automatic control, Telecommunications and Signals (ICATS15).

  • Amrouche, A., Falek, L., & Teffahi, H. (2017a). Design and implementation of a diacritic arabic text-to-speech system. The International Arab Journal of Information Technology, 14(4).

  • Amrouche, A., Falek, L., & Teffahi, H. (2019). Arabic speech synthesis system based on HMM. In Sixth International Conference on Electrical and Electronics Engineering (ICEEE 2019).

  • Attia, M. (2008). Handling Arabic morphological and syntactic ambiguities within the LFG framework with a view to machine translation. PhD Dissertation, University of Manchester.

  • Barbot, N., Boeffard, O., & Delhay, A. (2012). Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora. International Conference on Language Resources and Evaluation (LREC’12).

  • Boersma, P., & Weenink, D. (2019). Praat: Doing phonetics by computer [Computer program]. Version 6.0.46, from http://www.praat.org/.

  • Boros, T. et al. (2014). RSS-TOBI: A prosodically enhanced romanian speech corpus. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 316–320.

  • Boudraa, M., Boudraa, B., & Guerin, B. (2000). Twenty lists of ten Arabic sentences for assessment. Acta Acustica united with Acustica, pp. 870–882.

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. Interspeech.

    Google Scholar 

  • Chalamandaris, A., Karabetsos, S., Tsiakoulis, P., & Raptis, S. (2010). A unit selection text-to-speech synthesis system optimized for use with screen readers. EEE Transactions on Consumer Electronics, 56(3).

  • Chrobaka, M., Kenyonb, C., & Younga, Y. (2006). The reverse greedy algorithm for the metric K-Median problem. Information Processing Letters, 97(2), 31, 68–72.

  • Combescure, P. (1981). 20 listes de 10 Phrases Phonétiquement Equilibrées. Revue d’Acoustique, 14(56), 34–38.

  • Farghaly, A., & Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing, 8(4).

  • Ferrat, K., & Guerti, M. (2013). Classification of the Arabic emphatic consonants using time delay neural network. International Journal of Computer Applications, 80(10), 1–6.

  • Ferrat, K., & Guerti, M. (2017). An experimental study of the gemination in Arabic language. Archives of Acoustics, 42(4), 571–578.

    Article  Google Scholar 

  • Hafte, A., Sebsibe, H. M. (2018). Design of a tigrinya language speech corpus for speech recognition. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pp. 78–82.

  • Halabi, N. (2016). Modern Standard Arabic Phonetics for Speech Synthesis. Thesis for the degree of Doctor of Philosophy: University of Southampton.

  • Itunuoluwa, I., Jelili, O., & Olufunke O. (2014). Design and implementation of text to speech conversion for visually impaired people. International Journal of Applied Information Systems (IJAIS) Foundation of Computer Science FCS.

  • Janyoi, P., & Seresangtakul, P. (2020). F0 modeling for isarn speech synthesis using deep neural networks and syllable-level feature representation. The International Arab Journal of Information Technology, 17(6).

  • Jawaid, B., Kamran, A., & Bojar O. (2014). A tagged corpus and a tagger for Urdu. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014).

  • Muljono, H. A., Winarsih, N. A. S., & Supriyanto, C. (2019). An evaluation of sentence selection methods on the different phone-sized units for constructing Indonesian speech corpus. International Journal of Speech Technology, 23(1), 141–147.

  • Niladri, S. D., & Ramamoorthy, L. (2019). Utility and application of language corpora. Springer pp. 1–16.

  • Novitasari, S., Tjandra, A., Sakti S., & Nakamura, S. (2020). Cross-lingual machine speech chain for javanese, sundanese, balinese, and bataks speech recognition and synthesis. In Language Resources and Evaluation Conference (LREC 2020).

  • Qiong, H., Yannis, S., Ranniery, M., Korin, R., Junichi, Y., & Javier, L. (2014). An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis. In Interspeech 2014. pp. 780–784.

  • Satori, H., Hiyassat, H., Harti, M., & Chenfour N, (2009). Investigation arabic speech recognition using CMU sphinx system. The International Arab Journal of Information Technology6(2), 186–190.

  • Tadashi, I., Sunao, H., Masanobu, A., Yusuke, I., Noboru, M., & Hideyuki, M. (2015). Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum. In 16th Annual Conference of the International Speech Communication Association.

  • Tengku, M. T. S., & AbuAta, B. (2013). Arabic word stemming algorithms and retrieval effectiveness. In Proceedings of the World Congress on Engineering WCE 2013,.

  • Thao, V. D., Do-Dat, T., & Thu-Trang, T. N. (2011). Non-uniform unit selection in Vietnamese Speech Synthesis. In Proceedings of the 2011 Symposium on Information and Communication Technology, SoICT 2011.

  • Tian, J., Jani, N., & Imre, K. (2005). Optimal subset selection from text databases. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp 305–308.

Download references

Acknowledgements

We would like to thank Professor Levent Arslan for giving us the opportunity to visit SESTEK and BUSIM laboratory, the excellent working environment, allowing using its recording studio, his help and guidance during our internship and the exceptional exchanges with BUSIM and SESTEK teams.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aissa Amrouche.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amrouche, A., Abed, A., Ferrat, K. et al. Balanced Arabic corpus design for speech synthesis. Int J Speech Technol 24, 747–759 (2021). https://doi.org/10.1007/s10772-021-09846-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09846-8

Keywords

Navigation