research-article

Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

Authors:

Saméh Kchaou

University of Sfax, Tunisia

https://orcid.org/0000-0003-4674-7671

,

Rahma Boujelbane

Rahma Boujelbane

University of Sfax, Tunisia

https://orcid.org/0000-0003-2314-6064

,

Lamia Hadrich

University of Sfax, Tunisia

https://orcid.org/0000-0002-4868-657X

Authors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 3

Article No.: 85, Pages 1 - 21

https://doi.org/10.1145/3568674

Published: 14 April 2023 Publication History

Abstract

Deep Learning is one of the most promising technologies compared to other methods in the context of machine translation. It has been proven to achieve impressive results on large amounts of parallel data for well-endowed languages. Nevertheless, for low-resource languages such as the Arabic Dialects, Deep Learning models failed due to the lack of available parallel corpora. In this article, we present a method to create a parallel corpus to build an effective NMT model able to translate into MSA, Tunisian Dialect texts present in social networks. For this, we propose a set of data augmentation methods aiming to increase the size of the state-of-the-art parallel corpus. By evaluating the impact of this step, we noticed that it has effectively boosted both the size and the quality of the corpus. Then, using the resulted corpus, we compare the effectiveness of CNN, RNN and transformers models to translate Tunisian Dialect into MSA. Experiments show that a better translation is achieved by the transformer model with a BLEU score of 60 vs., respectively, 33.36 and 53.98 with RNN and CNN models.

References

[1]

R. Al-Ibrahim and R. M. Duwairi. 2020. Neural machine translation from Jordanian dialect to modern standard Arabic. In Proceedings of the 11th International Conference on Information and Communication Systems (ICICS). 173–178. DOI:

[2]

Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Translat. 32 (2018),167–189.

[3]

Ebtesam H. Almansor and Ahmed Al-Ani. 2018. A hybrid neural machine translation technique for translating low resource languages. In Machine Learning and Data Mining in Pattern Recognition. Springer International Publishing, Cham, 347–356.

Digital Library

[4]

Ebtesam H. Almansor and Ahmed Al-Ani. 2017. Translating dialectal Arabic as low resource language using word embedding. In Proceedings of the International Conference Recent Advances in Natural Language Processing. INCOMA Ltd., 52–57. DOI:

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv:1409.0473 [cs.CL].

[6]

Laith H. Baniata, Seyoung Park, and Seong-Bae Park. 2018. A neural machine translation model for Arabic dialects that utilizes multitask learning (MTL). Computat. Intell. Neurosci.Dec. 10 (2018).

[7]

Houda Bouamor, Nizar Habash, and Kemal Oflazer. 2014. A multidialectal parallel corpus of Arabic. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), 1240–1245. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/523_Paper.pdf.

[8]

Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani, Alexander Erdmann, and Kemal Oflazer. 2018. The MADAR Arabic dialect corpus and lexicon. In Proceedings of the 11th Language Resources and Evaluation Conference. European Language Resource Association. Retrieved from https://www.aclweb.org/anthology/L18-1535.

[9]

Rahma Boujelbane, Mariem Ellouze Khemekhem, and Lamia Hadrich Belguith. 2013. Mapping rules for building a Tunisian dialect lexicon and generating corpora. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 419–428. Retrieved from https://www.aclweb.org/anthology/I13-1048.

[10]

Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL].

[12]

Fatma El-zahraa El-taher, Alaa Aldin Hammouda, and Salah Abdel-Mageid. 2016. Automation of understanding textual contents in social networks. In Proceedings of the International Conference on Selected Topics in Mobile Wireless Networking (MoWNeT). 1–7. DOI:

[13]

Gao Fei, Zhu Jinhua, Wu Lijun, Xia Yingce, Qin Tao, Cheng Xueqi, Zhou Wengang, and Liu Tie-Yan. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of the Association for Computational Linguistics.

[14]

M. Graja, M. Jaoua, and L. Hadrich Belguith. 2015. Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 12 (2015), 2311–2321. DOI:

Digital Library

[15]

Ahmed Hamdi, Rahma Boujelbane, Nizar Habash, and Alexis Nasr. 2013. The effects of factorizing root and pattern mapping in bidirectional Tunisian–standard Arabic machine translation. In Proceedings of the MT Summit. Retrieved from https://hal.archives-ouvertes.fr/hal-00908761.

[16]

Serena Jeblee, Weston Feely, Houda Bouamor, Alon Lavie, Nizar Habash, and Kemal Oflazer. 2014. Domain and dialect adaptation for machine translation into Egyptian Arabic. In Proceedings of the EMNLP Workshop on Arabic Natural Language Processing (ANLP). Association for Computational Linguistics, 196–206. DOI:

[17]

Zhang Jinyi and Matsumoto Tadahiro. 2019. Corpus augmentation by sentence segmentation for low-resource neural machine translation. CoRR abs/1905.08945 (2019).

[18]

Karima Meftouh, Salima Harrat, S. Jamoussi, M. Abbas, and Kamel Smaïli. 2015. Machine translation experiments on PADIC: A parallel Arabic dialect corpus. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation.26–34.

[19]

Saméh Kchaou, Rahma Boujelbane, and Lamia Hadrich-Belguith. 2020. Parallel resources for Tunisian Arabic dialect translation. In Proceedings of the 5th Arabic Natural Language Processing Workshop. Association for Computational Linguistics, 200–206. Retrieved from https://www.aclweb.org/anthology/2020.wanlp-1.18.

[20]

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics.

[21]

Julia Kreutzer, Jasmijn Bastings, and Stefan Riezler. 2020. Joey NMT: A Minimalist NMT Toolkit for Novices. arxiv:1907.12484 [cs.CL].

[22]

Surafel M. Lakew, Mauro Cettolo, and Marcello Federico. 2018. A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation. arxiv:1806.06957 [cs.CL].

[23]

Y. Li, X. Li, Y. Yang, and R. Dong. 2020. A diverse data augmentation strategy for low-resource neural machine translation. Information11, 255 (2020),2078–2489.

[24]

Fadaee Marzieh, Bisazza Arianna, and Monz Christof. 2017. Data augmentation for low-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.567–573.

[25]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arxiv:1912.01703 [cs.LG].

[26]

Aquia Richburg, Ramy Eskander, Smaranda Muresan, and Marine Carpuat. 2020. An evaluation of subword segmentation strategies for neural machine translation of morphologically rich languages. In Proceedings of the the 4th Widening Natural Language Processing Workshop. Association for Computational Linguistics, 151–155. DOI:

[27]

Alexander Rush. 2018. The annotated transformer. In Proceedings of the Workshop for NLP Open Source Software (NLP-OSS). Association for Computational Linguistics, 52–60. DOI:

[28]

Wael Salloum and Nizar Habash. 2012. Elissa: A dialectal to standard Arabic machine translation system. In Proceedings of COLING 2012: Demonstration Papers. The COLING 2012 Organizing Committee, 385–392. Retrieved from https://www.aclweb.org/anthology/C12-3048.

[29]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 464–468. DOI:

[30]

Bashar Talafha, Mohammad Ali, Muhy Eddin Za’ter, Haitham Seelawi, Ibraheem Tuffaha, Mostafa Samir, Wael Farhan, and Hussein T. Al-Natsheh. 2020. Multi-dialect Arabic BERT for Country-level Dialect Identification. arxiv:2007.05612 [cs.CL].

[31]

Allahsera Auguste Tapo, Bakary Coulibaly, Sébastien Diarra, Christopher Homan, Julia Kreutzer, Sarah Luger, Arthur Nagashima, Marcos Zampieri, and Michael Leventhal. 2020. Neural machine translation for extremely low-resource African languages: A case study on Bambara. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. Association for Computational Linguistics.

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR abs/1706.03762 (2017).

[33]

Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-lingual Effectiveness of BERT. arxiv:1904.09077 [cs.CL].

[34]

Inès Zribi, M. Ellouze, L. Belguith, and P. Blache. 2017. Morphological disambiguation of Tunisian dialect. J. King Saud Univ. Comput. Inf. Sci. 29 (2017), 147–155.

Digital Library

Cited By

Abdul-Nabi RObeidat RBsoul A(2024)A Survey on Machine Translation of Low-Resource Arabic Dialects2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638285(1-6)Online publication date: 13-Aug-2024
https://doi.org/10.1109/ICICS63486.2024.10638285
Latief AJarin AYaniasih YNurul Afra DNurfadhilah EPebiana SHidayati NFajri R(2024)Latest Research in Data Augmentation for Low Resource Language Text Translation: A Review2024 International Conference on Computer, Control, Informatics and its Applications (IC3INA)10.1109/IC3INA64086.2024.10732042(185-190)Online publication date: 9-Oct-2024
https://doi.org/10.1109/IC3INA64086.2024.10732042
Sadat Samin MIbn Ahad JMedha TRahman FAmin MMohammed NRahman S(2024)BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826131(1635-1644)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10826131
Show More Cited By

Index Terms

Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch
1. Information systems

Recommendations

English to Arabic Braille Neural Machine Translation Through Corpus Augmentation
Abstract
In this paper we have shown the development of English to Arabic Braille Neural Machine Translation (NMT) System. For our experiments we have developed two NMT systems. The first was the baseline NMT system which was trained only on the English-...
Paraphrasing Arabic Metaphor with Neural Machine Translation
Abstract
The task of recognizing and generating paraphrases is an essential component in many Arabic natural language processing (NLP) applications. A well-established machine translation approach for automatically extracting paraphrases, leverages ...
Neural Networks Pipeline for Offline Machine Printed Arabic OCR

In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 3

March 2023

570 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3579816

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2023

Online AM: 02 November 2022

Accepted: 05 October 2022

Received: 26 January 2022

Published in TALLIP Volume 22, Issue 3

Permissions

Request permissions for this article.

Request permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
312
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abdul-Nabi RObeidat RBsoul A(2024)A Survey on Machine Translation of Low-Resource Arabic Dialects2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638285(1-6)Online publication date: 13-Aug-2024
https://doi.org/10.1109/ICICS63486.2024.10638285
Latief AJarin AYaniasih YNurul Afra DNurfadhilah EPebiana SHidayati NFajri R(2024)Latest Research in Data Augmentation for Low Resource Language Text Translation: A Review2024 International Conference on Computer, Control, Informatics and its Applications (IC3INA)10.1109/IC3INA64086.2024.10732042(185-190)Online publication date: 9-Oct-2024
https://doi.org/10.1109/IC3INA64086.2024.10732042
Sadat Samin MIbn Ahad JMedha TRahman FAmin MMohammed NRahman S(2024)BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826131(1635-1644)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10826131
Zhao CHamdulla A(2024)Crossing Linguistic Barriers: A Hybrid Attention Framework for Chinese-Arabic Machine Translation2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)10.1109/ACDSA59508.2024.10467398(1-6)Online publication date: 1-Feb-2024
https://doi.org/10.1109/ACDSA59508.2024.10467398
Abudalfa SAbdu FAlowaifeer M(2024)Arabic Text Formality Modification: A Review and Future Research DirectionsIEEE Access10.1109/ACCESS.2024.3511661(1-1)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3511661
Nahli OGugliotta EKhlif NGiulia B(2023)Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects2023 7th IEEE Congress on Information Science and Technology (CiSt)10.1109/CiSt56084.2023.10410009(293-298)Online publication date: 16-Dec-2023
https://doi.org/10.1109/CiSt56084.2023.10410009

Media

Figures

Other

Tables

Get Access

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

References

References

[1]

R. Al-Ibrahim and R. M. Duwairi. 2020. Neural machine translation from Jordanian dialect to modern standard Arabic. In Proceedings of the 11th International Conference on Information and Communication Systems (ICICS). 173–178. DOI:

[2]

Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Translat. 32 (2018),167–189.

[3]

Ebtesam H. Almansor and Ahmed Al-Ani. 2018. A hybrid neural machine translation technique for translating low resource languages. In Machine Learning and Data Mining in Pattern Recognition. Springer International Publishing, Cham, 347–356.

Digital Library

[4]

Ebtesam H. Almansor and Ahmed Al-Ani. 2017. Translating dialectal Arabic as low resource language using word embedding. In Proceedings of the International Conference Recent Advances in Natural Language Processing. INCOMA Ltd., 52–57. DOI:

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv:1409.0473 [cs.CL].

[6]

Laith H. Baniata, Seyoung Park, and Seong-Bae Park. 2018. A neural machine translation model for Arabic dialects that utilizes multitask learning (MTL). Computat. Intell. Neurosci.Dec. 10 (2018).

[7]

Houda Bouamor, Nizar Habash, and Kemal Oflazer. 2014. A multidialectal parallel corpus of Arabic. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), 1240–1245. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/523_Paper.pdf.

[8]

Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani, Alexander Erdmann, and Kemal Oflazer. 2018. The MADAR Arabic dialect corpus and lexicon. In Proceedings of the 11th Language Resources and Evaluation Conference. European Language Resource Association. Retrieved from https://www.aclweb.org/anthology/L18-1535.

[9]

Rahma Boujelbane, Mariem Ellouze Khemekhem, and Lamia Hadrich Belguith. 2013. Mapping rules for building a Tunisian dialect lexicon and generating corpora. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 419–428. Retrieved from https://www.aclweb.org/anthology/I13-1048.

[10]

Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL].

[12]

Fatma El-zahraa El-taher, Alaa Aldin Hammouda, and Salah Abdel-Mageid. 2016. Automation of understanding textual contents in social networks. In Proceedings of the International Conference on Selected Topics in Mobile Wireless Networking (MoWNeT). 1–7. DOI:

[13]

Gao Fei, Zhu Jinhua, Wu Lijun, Xia Yingce, Qin Tao, Cheng Xueqi, Zhou Wengang, and Liu Tie-Yan. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of the Association for Computational Linguistics.

[14]

M. Graja, M. Jaoua, and L. Hadrich Belguith. 2015. Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 12 (2015), 2311–2321. DOI:

Digital Library

[15]

Ahmed Hamdi, Rahma Boujelbane, Nizar Habash, and Alexis Nasr. 2013. The effects of factorizing root and pattern mapping in bidirectional Tunisian–standard Arabic machine translation. In Proceedings of the MT Summit. Retrieved from https://hal.archives-ouvertes.fr/hal-00908761.

[16]

Serena Jeblee, Weston Feely, Houda Bouamor, Alon Lavie, Nizar Habash, and Kemal Oflazer. 2014. Domain and dialect adaptation for machine translation into Egyptian Arabic. In Proceedings of the EMNLP Workshop on Arabic Natural Language Processing (ANLP). Association for Computational Linguistics, 196–206. DOI:

[17]

Zhang Jinyi and Matsumoto Tadahiro. 2019. Corpus augmentation by sentence segmentation for low-resource neural machine translation. CoRR abs/1905.08945 (2019).

[18]

Karima Meftouh, Salima Harrat, S. Jamoussi, M. Abbas, and Kamel Smaïli. 2015. Machine translation experiments on PADIC: A parallel Arabic dialect corpus. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation.26–34.

[19]

Saméh Kchaou, Rahma Boujelbane, and Lamia Hadrich-Belguith. 2020. Parallel resources for Tunisian Arabic dialect translation. In Proceedings of the 5th Arabic Natural Language Processing Workshop. Association for Computational Linguistics, 200–206. Retrieved from https://www.aclweb.org/anthology/2020.wanlp-1.18.

[20]

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics.

[21]

Julia Kreutzer, Jasmijn Bastings, and Stefan Riezler. 2020. Joey NMT: A Minimalist NMT Toolkit for Novices. arxiv:1907.12484 [cs.CL].

[22]

Surafel M. Lakew, Mauro Cettolo, and Marcello Federico. 2018. A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation. arxiv:1806.06957 [cs.CL].

[23]

Y. Li, X. Li, Y. Yang, and R. Dong. 2020. A diverse data augmentation strategy for low-resource neural machine translation. Information11, 255 (2020),2078–2489.

[24]

Fadaee Marzieh, Bisazza Arianna, and Monz Christof. 2017. Data augmentation for low-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.567–573.

[25]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arxiv:1912.01703 [cs.LG].

[26]

Aquia Richburg, Ramy Eskander, Smaranda Muresan, and Marine Carpuat. 2020. An evaluation of subword segmentation strategies for neural machine translation of morphologically rich languages. In Proceedings of the the 4th Widening Natural Language Processing Workshop. Association for Computational Linguistics, 151–155. DOI:

[27]

Alexander Rush. 2018. The annotated transformer. In Proceedings of the Workshop for NLP Open Source Software (NLP-OSS). Association for Computational Linguistics, 52–60. DOI:

[28]

Wael Salloum and Nizar Habash. 2012. Elissa: A dialectal to standard Arabic machine translation system. In Proceedings of COLING 2012: Demonstration Papers. The COLING 2012 Organizing Committee, 385–392. Retrieved from https://www.aclweb.org/anthology/C12-3048.

[29]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 464–468. DOI:

[30]

Bashar Talafha, Mohammad Ali, Muhy Eddin Za’ter, Haitham Seelawi, Ibraheem Tuffaha, Mostafa Samir, Wael Farhan, and Hussein T. Al-Natsheh. 2020. Multi-dialect Arabic BERT for Country-level Dialect Identification. arxiv:2007.05612 [cs.CL].

[31]

Allahsera Auguste Tapo, Bakary Coulibaly, Sébastien Diarra, Christopher Homan, Julia Kreutzer, Sarah Luger, Arthur Nagashima, Marcos Zampieri, and Michael Leventhal. 2020. Neural machine translation for extremely low-resource African languages: A case study on Bambara. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. Association for Computational Linguistics.

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR abs/1706.03762 (2017).

[33]

Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-lingual Effectiveness of BERT. arxiv:1904.09077 [cs.CL].

[34]

Inès Zribi, M. Ellouze, L. Belguith, and P. Blache. 2017. Morphological disambiguation of Tunisian dialect. J. King Saud Univ. Comput. Inf. Sci. 29 (2017), 147–155.

Digital Library

View full text|Download PDF

View Issue’s Table of Contents