skip to main content
10.1145/3230905.3230931acmotherconferencesArticle/Chapter ViewAbstractPublication PageslopalConference Proceedingsconference-collections
research-article

Arabic Diacritization with Gated Recurrent Unit

Published: 02 May 2018 Publication History

Abstract

Arabic and similar languages require the use of diacritics in order to determine the necessary parameters to pronounce and identify every part of the speech correctly. Therefore, when it comes to perform Natural Language Processing (NLP) over Arabic, diacritization is a crucial step. In this paper we use a gated recurrent unit network as a language-independent framework for Arabic diacritization. The end-to-end approach allows to use exclusively vocalized text to train the system without using external resources. Evaluation is performed versus the state-of-the-art literature results. We demonstrate that we achieve state-of-the-art results and enhance the learning process by scoring better performance in the training and testing timing.

References

[1]
Mohamed Boudchiche and Azzeddine Mazroui. 2015. Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: statistical study. In Proceedings of the 5th International Conference on Information & Communication Technology and Accessibility (ICTA'15), Marrakech, Morocco.
[2]
T. A. El-Sadany and M. A. Hashish. 1988. Semi-Automatic Vowelization of Arabic Verbs. In Proceedings of the 10th National computer conference, Utilization of computers in development, Jeddah, Saudi Arabia.
[3]
Sameh Alansary. 2016. Alserag: An Automatic Diacritization System for Arabic. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics.
[4]
Imed Zitouni and Ruhi Sarikaya. 2009. Arabic diacritic restoration approach based on maximum entropy models. Comput. Speech Lang. 23, 3 (July 2009), 257--276.
[5]
Yonatan Belinkov and James Glass. 2015. Arabic diacritization with Recurrent Neural Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'15), Lisbon, Portugal. http://www.aclweb.org/anthology/D15-1274
[6]
Y. Bengio, P. Simard and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, n№ 12, 157--166.
[7]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735--1780.
[8]
G. A. Abandah, A. Graves, B. Al-Shagoor, A. Arabiyat, F. Jamour and M. Al-Taee. 2015. Automatic diacritization of Arabic text using recurrent neural networks. International Journal on Document Analysis and Recognition, vol.18, 183--197.
[9]
K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk and Y. Bengio, 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. arXiv:1406.1078
[10]
D. Bahdanau, K. Cho and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. Proceedings of the International Conference on Learning Representations, Vancouver, Canada. arXiv:1409.0473
[11]
Alex Graves. 2013. Generating sequences with recurrent neural networks. Neural and Evolutionary Computing. arXiv:1308.0850
[12]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13), 3111--3119.
[13]
Taha Zerrouki and Amar Balla. 2017. Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data in brief.
[14]
Imed Zitouni, Jeffrey S. Sorensen, and Ruhi Sarikaya. 2006. Maximum entropy based restoration of Arabic diacritics. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, Stroudsburg, PA, USA, 577--584.

Cited By

View all
  • (2020)Multi-components System for Automatic Arabic DiacritizationAdvances in Information Retrieval10.1007/978-3-030-45439-5_23(341-355)Online publication date: 8-Apr-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
LOPAL '18: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications
May 2018
357 pages
ISBN:9781450353045
DOI:10.1145/3230905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic diacritization
  2. Gated recurrent unit
  3. deep learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

LOPAL '18
LOPAL '18: Theory and Applications
May 2 - 5, 2018
Rabat, Morocco

Acceptance Rates

LOPAL '18 Paper Acceptance Rate 61 of 141 submissions, 43%;
Overall Acceptance Rate 61 of 141 submissions, 43%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Multi-components System for Automatic Arabic DiacritizationAdvances in Information Retrieval10.1007/978-3-030-45439-5_23(341-355)Online publication date: 8-Apr-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media