Automatic Text Summarization for Moroccan Arabic Dialect Using an Artificial Intelligence Approach

Gaanoun, Kamel; Naira, Abdou Mohamed; Allak, Anass; Benelallam, Imade

doi:10.1007/978-3-031-06458-6_13

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 449))

Included in the following conference series:

International Conference on Business Intelligence

554 Accesses
2 Citations

Abstract

A major advantage of artificial intelligence is its ability to automatically perform tasks at a human-like level quickly; this is needed in many fields, and more particularly in Automatic Text Summarization (ATS). Several advances related to this technique were made in recent years for both extractive and abstractive approaches, notably with the advent of sequence-to-sequence (seq2seq) and Transformers-based models. In spite of this, the Arabic language is largely less represented in this field, due to its complexity and a lack of datasets for ATS. Although some ATS works exist for Modern Standard Arabic (MSA), there is a lack of ATS works for the Arabic dialects that are more prevalent on social networking platforms and the Internet in general. Intending to take an initial step toward meeting this need, we present the first work of ATS concerning the Moroccan dialect known as Darija. This paper introduces the first dataset intended for the summarization of articles written in Darija. In addition, we present state-of-the-art results based on the ROUGE metric for extractive methods based on BERT embeddings and K-MEANS clustering, as well as abstractive methods based on Transformers models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We retained articles with at least 30 words.
2.
https://github.com/KamelGaanoun/MoroccanSummarization.
3.
https://commoncrawl.org.

References

Al Qassem, L.M., Wang, D., Al Mahmoud, Z., Barada, H., Al-Rubaie, A., Almoosa, N.I.: Automatic Arabic summarization: a survey of methodologies and systems. PCS 117, 10–18 (2017). Arabic Computational Linguistics
Google Scholar
Zaidan, O., Callison-Burch, C.: Arabic dialect identification. Comput. Linguist. 40(1), 171–202 (2013)
Article Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. JRD 2(2), 159–165 (1958)
MathSciNet Google Scholar
Gupta, V., Lehal, G.: A survey of text summarization extractive techniques. JETWI 2(3), 258–268 (2010)
Article Google Scholar
Afsharizadeh, M., et al.: Query-oriented text summarization using sentence extraction technique. In: 2018 4th ICWR, pp. 128–132 (2018)
Google Scholar
Nagwani, N.: Summarizing large text collection using topic modeling and clustering based on mapreduce framework. JBD 2 (2015). Article number: 6. https://doi.org/10.1186/s40537-015-0020-5
Gialitsis, N., et al.: A topic-based sentence representation for extractive text summarization, September 2019
Google Scholar
Shirwandkar, N.S., et al.: Extractive text summarization using deep learning. In: ICCUBEA 2018, pp. 1–5 (2018)
Google Scholar
Gupta, S., Gupta, S.K.: Abstractive summarization: an overview of the SOTA. ESA 121, 49–65 (2019)
Google Scholar
Nallapati, R., et al.: Abstractive text summarization using seq2seq RNNs and beyond. In: The 20th SIGNLL, Berlin, Germany, pp. 280–290. Association for Computational Linguistics, August 2016
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. NC 9(8), 1735–1780 (1997)
Google Scholar
Song, S., et al.: Abstractive text summarization using LSTM-CNN based deep learning. MTA 78(1), 857–875 (2019). https://doi.org/10.1007/s11042-018-5749-3
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NIPS, NIPS 2017, Red Hook, NY, USA, pp. 6000–6010. Curran Associates Inc. (2017)
Google Scholar
Iwasaki, Y., et al.: Japanese abstractive text summarization using BERT. In: 2019 TAAI, pp. 1–5 (2019)
Google Scholar
Liu, Y., et al.: Text summarization with pretrained encoders. ArXiv, abs/1908.08345 (2019)
Google Scholar
See, A., et al.: Get to the point: summarization with pointer-generator networks, pp. 1073–1083, January 2017
Google Scholar
Haboush, A., et al.: Arabic text summerization model using clustering techniques. World Comput. Sci. Inf. Technol. J. 2(3), 62–67 (2012)
Google Scholar
Alami, N., Meknassi, M., Noureddine, R.: Automatic texts summarization: current state of the art. J. Asian Sci. Res. 5, 1–15 (2015)
Google Scholar
Douzidia, F.S., et al.: Lakhas, an Arabic summarization system (2004)
Google Scholar
Wazery, et al.: Abstractive Arabic text summarization based on deep learning. Comput. Intell. Neurosci. 2022 (2022). Article ID: 1566890
Google Scholar
Suleiman, D., Awajan, A.: Deep learning based abstractive Arabic text summarization using two layers encoder and one layer decoder. J. Theor. Appl. Inf. Technol. 98, 3233 (2020)
Google Scholar
Al-Maleh, M., Desouki, S.: Arabic text summarization using deep learning approach. J. Big Data 7(1) (2020). Article number: 109. https://doi.org/10.1186/s40537-020-00386-7
Peters, M.E., Zettlemoyer, L., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 2227–2237. ACL, June 2018
Google Scholar
LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE ISCS, pp. 253–256 (2010)
Google Scholar
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805 (2019)
Google Scholar
Gaanoun, K., Naira, A., Benelallam, I., Allak, A.: DarijaBERT (2021). https://github.com/AIOXLABS/DBert. Accessed 19 Jan 2022
Pappagari, R., et al.: Hierarchical transformers for long document classification, pp. 838–844, December 2019
Google Scholar
Sandhaus, E.: The New York Times Annotated Corpus (2008)
Google Scholar
Narayan, S., Cohen, S.B., Lapata, M.: Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 1797–1807. Association for Computational Linguistics, October–November 2018
Google Scholar
EL-Haj, M., et al.: Using mechanical Turk to create a corpus of Arabic summaries, January 2010
Google Scholar
Helmy, M., et al.: Applying deep learning for Arabic keyphrase extraction. Procedia Comput. Sci. 142, 254–261 (2018). Arabic Computational Linguistics
Article Google Scholar
Zaki, A.M., Khalil, M.I., Abbas, H.M.: Deep architectures for abstractive text summarization in multiple languages. In: 2019 14th ICCES, pp. 22–27. IEEE (2019)
Google Scholar
Miller, D.: Leveraging BERT for extractive text summarization on lectures. ArXiv, abs/1906.04165 (2019)
Google Scholar
Sutskever, I., et al.: Sequence to sequence learning with neural networks. In: NIPS (2014)
Google Scholar
Soliman, A.B.: Summarization-Arabic-English-news (2021). https://huggingface.co/marefa-nlp/summarization-arabic-english-news. Accessed 19 Jan 2022
Marefa NLP. Arabic T5 Large Model (2021). https://huggingface.co/bakrianoo/t5-arabic-large. Accessed 19 Jan 2022
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv, abs/1910.10683 (2020)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, October 2020
Google Scholar
Steinberger, J., Jezek, K.: Evaluation measures for text summarization. Comput. Inform. 28, 251–275 (2009)
Google Scholar
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics, July 2004
Google Scholar
Du, Z., et al.: All NLP tasks are generation tasks: a general pretraining framework. CoRR, abs/2103.10360 (2021)
Google Scholar
Liang, X., et al.: R-Drop: regularized dropout for neural networks. CoRR, abs/2106.14448 (2021)
Google Scholar
Xiao, D., et al.: ERNIE-GEN: an enhanced multiflow pretraining and fine-tuning framework for natural language generation, pp. 3969–3975, July 2020
Google Scholar
Zhang, J., et al.: PEGASUS: pretraining with extracted gap-sentences for abstractive summarization, December 2019
Google Scholar
Zaki, A.M., et al.: AMHARIC abstractive text summarization. CoRR, abs/2003.13721 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut National de Statistique et d’Economie Appliquée, SI2M Lab., Rabat, Morocco
Kamel Gaanoun, Abdou Mohamed Naira, Anass Allak & Imade Benelallam
AIOX LABS, Rabat, Morocco
Abdou Mohamed Naira, Anass Allak & Imade Benelallam

Authors

Kamel Gaanoun
View author publications
You can also search for this author in PubMed Google Scholar
Abdou Mohamed Naira
View author publications
You can also search for this author in PubMed Google Scholar
Anass Allak
View author publications
You can also search for this author in PubMed Google Scholar
Imade Benelallam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamel Gaanoun .

Editor information

Editors and Affiliations

Sultan Moulay Slimane University, Beni-Mellal, Morocco
Mohamed Fakir
Sultan Moulay Slimane University, Beni Mellal, Morocco
Mohamed Baslam
Sultan Moulay Slimane University, Beni-Mellal, Morocco
Rachid El Ayachi

A Appendices

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaanoun, K., Naira, A.M., Allak, A., Benelallam, I. (2022). Automatic Text Summarization for Moroccan Arabic Dialect Using an Artificial Intelligence Approach. In: Fakir, M., Baslam, M., El Ayachi, R. (eds) Business Intelligence. CBI 2022. Lecture Notes in Business Information Processing, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-031-06458-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-06458-6_13
Published: 13 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06457-9
Online ISBN: 978-3-031-06458-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Text Summarization for Moroccan Arabic Dialect Using an Artificial Intelligence Approach

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendices

A Appendices

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation