Automatic Arabic text summarization: a survey

Al-Saleh, Asma Bader; Menai, Mohamed El Bachir

doi:10.1007/s10462-015-9442-x

Automatic Arabic text summarization: a survey

Published: 07 October 2015

Volume 45, pages 203–234, (2016)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Asma Bader Al-Saleh¹ &
Mohamed El Bachir Menai¹

2259 Accesses
48 Citations
11 Altmetric
Explore all metrics

Abstract

This survey investigates several research studies that have been conducted in the field of Arabic text summarization. Specifically, it addresses summarization and evaluation methods, as well as the corpora used in those studies. The literature in this field is fairly limited and relatively new compared to the available literature on other languages, such as English. Therefore, there exists a great opportunity for further research in Arabic text summarization. In addition, one of the largest problems in Arabic summarization was the absence of Arabic gold standard summaries, although this situation is beginning to change, especially with the inclusion of Arabic language as a part of the corpora and tasks in the TAC 2011 MultiLing Pilot and ACL 2013 MultiLing Workshop. Finally, providing the required corpora and adopting them in Arabic summarization studies is an essential demand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on large language model based autonomous agents

Article Open access 22 March 2024

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Artificial Intelligence in Journalism: A Boon or Bane?

Notes

References

AL-Khawaldeh F, Samawi V (2015) Lexical cohesion and entailment based segmentation for arabic text summarization (lceas). World Comput Sci Inf Technol J 5(3):51–60
Google Scholar
Al-Radaideh Q, Afif M (2009) Arabic text summarization using aggregate similarity. In: International Arab conference on information technology (ACIT2009), Yemen
Al-Saeedan W, Menai M (2015) Swarm intelligence for natural language processing. Int J Artif Intell Soft Comput 5(2):117–150
Article Google Scholar
Al-Sanie W (2005) Towards an infrastructure for arabic text summarization using rhetorical structure theory. Master’s thesis, King Saud University, Riyadh
Al-Sulaiti L, Atwell ES (2006) The design of a corpus of contemporary arabic. Int J Corpus Linguist 11(2):135–171
Article Google Scholar
Alguliev RM, Aliguliyev RM, Isazade NR (2013a) Formulation of document summarization as a 0–1 nonlinear programming problem. Comput Ind Eng 64(1):94–102
Article Google Scholar
Alguliev RM, Aliguliyev RM, Isazade NR (2013b) Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 40(5):1675–1689
Article Google Scholar
Alotaiby F, Foda S, Alkharashi I (2012) New approaches to automatic headline generation for arabic documents. J Eng Comput Innov 3(1):11–25
Google Scholar
Azmi AM, Al-Thanyyan S (2012) A text summarizer for arabic. Comput Speech Lang 26(4):260–273
Article Google Scholar
Bassiouney R, Katz EG (2012) Arabic language and linguistics. Georgetown University Press, Washington, DC
Google Scholar
Belguith L, Ellouze M, Maaloul M, Jaoua M, Jaoua F, Blache P (2014) Automatic summarization. In: Zitouni I (ed) Natural language processing of semitic languages, theory and applications of natural language processing. Springer, Berlin, pp 371–408
Chapter Google Scholar
Belkebir R, Guessoum A (2015) A supervised approach to arabic text summarization using adaboost. In: Rocha A, Correia AM, Costanzo S, Reis LP (eds) New contributions in information systems and technologies, advances in intelligent systems and computing, vol 353, pp 227–236
Binwahlan MS, Salim N, Suanmali L (2010) Fuzzy swarm diversity hybrid model for text summarization. Inf Process Manage 46(5):571–588
Article Google Scholar
Boudabous M, Maaloul M, Belguith L (2010) Digital learning for summarizing arabic documents. In: Loftsson H, Rgnvaldsson E, Helgadttir S (eds) Advances in natural language processing, lecture notes in computer science, vol 6233. Springer, Berlin, pp 79–84
Boujelben I, Jamoussi S, Hamadou AB (2014) A hybrid method for extracting relations between arabic named entities. J King Saud Univ Comput Inf Sci 26(4):425–440 special Issue on Arabic NLP
Google Scholar
Buckwalter T, Parkinson D (2011) A frequency dictionary of Arabic: core vocabulary for learners. Routledge
Cambria E, White B (2014) Jumping nlp curves: a review of natural language processing research [review article]. Comput Intell Mag IEEE 9(2):48–57
Article Google Scholar
Chaibi AH, Naili M, Sammoud S (2014) Topic segmentation for textual document written in arabic language. Procedia Comput Sci 35:437–446
Article Google Scholar
Davies M (2010) The corpus of contemporary american english as the first reliable monitor corpus of english. Literary and Linguistic Computing, Oxford
Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Douzidia FS, Lapalme G (2004) Lakhas, an arabic summarization system. DUC 2004:128–135
Google Scholar
Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
Article MATH Google Scholar
El-Fishawy N, Hamouda A, Attiya GM, Atef M (2013) Arabic summarization in twitter social network. Ain Shams Eng J 5(2):411–420
Article Google Scholar
El-Ghannam F, El-Shishtawy T (2013) Multi-topic multi-document summarizer. Int J Comput Sci Inf Technol 5(6):77–90
Google Scholar
El-Haj M, Hammo B (2008) Evaluation of query-based arabic text summarization system. In: International conference on natural language processing and knowledge engineering, 2008. NLP-KE ’08, pp 1–7
El-Haj M, Koulali R (2013) Kalimat a multipurpose arabic corpus. In: The 2nd workshop on Arabic corpus linguistics (WACL-2)
El-Haj M, Rayson P (2013) Using a keyness metric for single and multi document summarisation. In: Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 64–71
El-Haj M, Kruschwitz U, Fox C (2010) Using mechanical turk to create a corpus of arabic summaries. In: Proceedings of the international conference on language resources and evaluation (LREC), Valletta, Malta, pp 36–39, in the language resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th international language resources and evaluation conference (LREC 2010)
El-Haj M, Kruschwitz U, Fox C (2011a) Exploring clustering for multi-document arabic summarisation. In: Salem M, Shaalan K, Oroumchian F, Shakery A, Khelalfa H (eds) Information retrieval technology, lecture notes in computer science, vol 7097. Springer, Berlin, pp 550–561
El-Haj M, Kruschwitz U, Fox C (2011b) Multi-document arabic text summarisation. In: Computer science and electronic engineering conference (CEEC), 2011 3rd, IEEE Xplore, pp 40–44
El-Haj M, Kruschwitz U, Fox C (2011c) University of essex at the tac 2011 multilingual summarisation pilot. In: Proceedings of the text analysis conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA
El-Haj M, Kruschwitz U, Fox C (2014) Creating language resources for under-resourced languages: methodologies, and experiments with arabic. Language Resources and Evaluation, pp 1–32
Evans D, McKeown K (2005) Identifying similarities and differences across english and arabic news. In: International conference on intelligence analysis, McLean, VA, May 2005
Fabri R, Gasser M, Habash N, Kiraz G, Wintner S (2014) Linguistic introduction: the orthography, morphology and syntax of semitic languages. In: Zitouni I (ed) Natural language processing of semitic languages, theory and applications of natural language processing. Springer, Berlin, pp 3–41
Chapter Google Scholar
Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process 8(4):14:1–14:22
Article Google Scholar
Fattah MA, Ren F (2009) Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput Speech Lang 23(1):126–144
Article Google Scholar
Fejer H, Omar N (2014) Automatic arabic text summarization using clustering and keyphrase extraction. In: 2014 International Conference on information technology and multimedia (ICIMU), pp 293–298
Frank E, Wang Y, Inglis S, Holmes G, Witten I (1998) Using model trees for classification. Mach Learn 32(1):63–76
Article MATH Google Scholar
Froud H, Lachkar A, Ouatik SA (2013) Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering. Int J Data Mining Knowl Manag Process (IJDKP) 3(1). doi:10.5121/ijdkp.2013.3107
Giannakopoulos G (2013) Multi-document multilingual summarization and evaluation tracks in acl 2013 multiling workshop. In: Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 20–28
Giannakopoulos G, Karkaletsis V (2011) Autosummeng and memog in evaluating guided summaries. In: Proceedings of the text analysis conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA
Giannakopoulos G, Karkaletsis V (2013) Summary evaluation: together we stand npower-ed. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, lecture notes in computer science, vol 7817. Springer, Berlin, pp 436–450
Giannakopoulos G, El-Haj M, Favre B, Litvak M, Steinberger J, Varma V (2012) Tac 2011 multiling pilot overview. In: Text analysis conference (TAC)
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, SIGIR ’01, pp 19–25
Haboush A, Momani A, Al-Zoubi M, Tarazi M (2012) Arabic text summerization model using clustering techniques. World Comput Sci Inf Technol J 2(2):62–67
Google Scholar
Ibrahim A, Elghazaly T (2013a) Improve the automatic summarization of arabic text depending on rhetorical structure theory. In: 2013 12th Mexican international conference on artificial intelligence (MICAI), pp 223–227
Ibrahim A, Elghazaly T (2013b) Rhetorical representation and vector representation in summarizing arabic text. Natural language processing and information systems, lecture notes in computer science, vol 7934. Springer, Berlin, pp 421–424
Imam I, Hamouda A, Khalek HAA (2013) An ontology-based summarization system for arabic documents (ossad). Int J Comput Appl 74(17):38–43
Google Scholar
Inouye D, Kalita J (2011) Comparing twitter summarization algorithms for multiple post summaries. In: Privacy, security, risk and trust (passat), 2011 IEEE third international conference on and 2011 IEEE 3rd international conference on social computing (socialcom), pp 298–306
Ismail S, Moawd I, Aref M (2013) Arabic text representation using rich semantic graph: A case study. In: Proceedings of the 4th European conference of computer science (ECCS ’13), pp 148–153
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446
Article Google Scholar
Jones KS, Galliers JR (1996) Evaluating natural language processing systems: an analysis and review. Springer, Secaucus, NJ, USA
Google Scholar
Keskes I, Boudabous MM, Maaloul MH, Hadrich Belguith L (2012) Étude comparative entre trois approches de résumé automatique de documents arabes (comparative study of three approaches to automatic summarization of arabic documents) [in french]. Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, vol 2: TALN. ATALA/AFCP, Grenoble, France, pp 225–238
Li L, Forascu C, El-Haj M, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 1: Arabic, english, greek, chinese, romanian. Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 1–12
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Marie-Francine Moens SS (ed) Text summarization branches out: proceedings of the ACL-04 workshop. Association for Computational Linguistics, Barcelona, Spain, pp 74–81
Lin CY, Hovy E (2002) Manual and automatic evaluation of summaries. In: Proceedings of the ACL-02 workshop on automatic summarization, vol 4. Association for Computational Linguistics, Stroudsburg, PA, USA, AS ’02, pp 45–51
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41
Article Google Scholar
Lloret E, Palomar M (2013) Tackling redundancy in text summarization through different levels of language analysis. Comput Stand Interfaces 35(5):507–518
Article Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Mani I (2001) Automatic summarization, vol 3. John Benjamins Publishing, Amsterdam
Book MATH Google Scholar
McKeown K, Radev DR (1995) Generating summaries of multiple news articles. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, SIGIR ’95, pp 74–82
Mehdad Y, Magnini B (2009) Optimizing textual entailment recognition using particle swarm optimization. In: Proceedings of the 2009 workshop on applied textual inference. Association for Computational Linguistics, pp 36–43
Mima H, Ananiadou S (2000) An application and e aluation of the c/nc-value approach for the automatic term recognition of multi-word units in japanese. Int J Terminol 6(2):175–194
Article Google Scholar
Nenkova A (2006) Summarization evaluation for text and speech: issues and approaches. In: INTERSPEECH
Nenkova A, Passonneau R (2004) Evaluating content selection in summarization: The pyramid method. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings. Association for Computational Linguistics, Boston, Massachusetts, USA, pp 145–152
Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427
Article Google Scholar
Oufaida H, Nouali O, Blache P (2014) Minimum redundancy and maximum relevance for single and multi-document arabic text summarization. J King Saud Univ Comput Inf Sci 26(4):450–461 special Issue on Arabic NLP
Google Scholar
Over P, Dang H, Harman D (2007) Duc in context. Inf Process Manag 43(6):1506–1520
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—a platform for multidocument multilingual text summarization. In: Conference on language resources and evaluation (LREC), Lisbon, Portugal
Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28(4):399–408
Article Google Scholar
Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: Proceedings of the workshop on comparing corpora, vol 9. Association for Computational Linguistics, Stroudsburg, PA, USA, WCC ’00, pp 1–6
Ryding K (2005) A reference grammar of modern standard Arabic. Cambridge University Press, Cambridge
Book Google Scholar
Saggion H, Poibeau T (2013) Automatic text summarization: Past, present and future. In: Poibeau T, Saggion H, Piskorski J, Yangarber R (eds) Multi-source, multilingual information extraction and summarization, theory and applications of natural language processing. Springer, Berlin, pp 3–21
Chapter Google Scholar
Schlesinger J, OLeary D, Conroy J (2008) Arabic /english multi-document summarization with classy-the past and the future. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, lecture notes in computer science, vol 4919. Springer, Berlin, pp 568–581
Shaheen M, Ezzeldin A (2014) Arabic question answering: systems, resources, tools, and future trends. Arabian J Sci Eng 39(6):4541–4564
Article Google Scholar
Sobh I, Darwish N, Fayek M (2007) An optimized dual classification system for arabic extractive generic text summarization. In: Proceeding of the 7th conference on language engineering
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manage 43(6):1606–1618
Article Google Scholar
Wu JW, Tseng J, Tsai WN (2010) A discrete particle swarm optimization algorithm for domain independent linear text segmentation. In: IEEE international conference on Granular computing (GrC), 2010, pp 519–524
Yih Wt, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’07, pp 1776–1782

Download references

Acknowledgments

This work was supported by the Research Center of College of Computer and Information Sciences, King Saud University. The authors are grateful for this support. Also, the authors would like to thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh, 11543, Saudi Arabia
Asma Bader Al-Saleh & Mohamed El Bachir Menai

Authors

Asma Bader Al-Saleh
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El Bachir Menai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asma Bader Al-Saleh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Saleh, A.B., Menai, M.E.B. Automatic Arabic text summarization: a survey. Artif Intell Rev 45, 203–234 (2016). https://doi.org/10.1007/s10462-015-9442-x

Download citation

Published: 07 October 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10462-015-9442-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Arabic text summarization: a survey

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

A survey on deep learning approaches for text-to-SQL

Artificial Intelligence in Journalism: A Boon or Bane?

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Arabic text summarization: a survey

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

A survey on deep learning approaches for text-to-SQL

Artificial Intelligence in Journalism: A Boon or Bane?

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation