Skip to main content
Log in

Automatic Arabic text summarization: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This survey investigates several research studies that have been conducted in the field of Arabic text summarization. Specifically, it addresses summarization and evaluation methods, as well as the corpora used in those studies. The literature in this field is fairly limited and relatively new compared to the available literature on other languages, such as English. Therefore, there exists a great opportunity for further research in Arabic text summarization. In addition, one of the largest problems in Arabic summarization was the absence of Arabic gold standard summaries, although this situation is beginning to change, especially with the inclusion of Arabic language as a part of the corpora and tasks in the TAC 2011 MultiLing Pilot and ACL 2013 MultiLing Workshop. Finally, providing the required corpora and adopting them in Arabic summarization studies is an essential demand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.systransoft.com.

  2. http://www.nlpir.nist.gov/related_projects/tipster_summac.

  3. http://www.nlpir.nist.gov/projects/duc/index.html.

  4. http://www.nist.gov/tac.

  5. http://www.nist.gov.

  6. http://www.aclweb.org.

  7. http://www.nist.gov/tac/2011/Summarization/index.html.

  8. http://multiling.iit.demokritos.gr/pages/view/662/multiling-2013.

  9. http://acl2013.org/site.

  10. http://multiling.iit.demokritos.gr/pages/view/1516/multiling-2015.

  11. http://www.mturk.com.

  12. http://www.wikipedia.com.

  13. http://translate.google.com.

  14. http://www.berouge.com.

  15. http://news.google.com.

  16. https://twitter.com.

  17. http://catalog.ldc.upenn.edu/LDC2003T12.

  18. https://www.ldc.upenn.edu.

References

  • AL-Khawaldeh F, Samawi V (2015) Lexical cohesion and entailment based segmentation for arabic text summarization (lceas). World Comput Sci Inf Technol J 5(3):51–60

    Google Scholar 

  • Al-Radaideh Q, Afif M (2009) Arabic text summarization using aggregate similarity. In: International Arab conference on information technology (ACIT2009), Yemen

  • Al-Saeedan W, Menai M (2015) Swarm intelligence for natural language processing. Int J Artif Intell Soft Comput 5(2):117–150

    Article  Google Scholar 

  • Al-Sanie W (2005) Towards an infrastructure for arabic text summarization using rhetorical structure theory. Master’s thesis, King Saud University, Riyadh

  • Al-Sulaiti L, Atwell ES (2006) The design of a corpus of contemporary arabic. Int J Corpus Linguist 11(2):135–171

    Article  Google Scholar 

  • Alguliev RM, Aliguliyev RM, Isazade NR (2013a) Formulation of document summarization as a 0–1 nonlinear programming problem. Comput Ind Eng 64(1):94–102

    Article  Google Scholar 

  • Alguliev RM, Aliguliyev RM, Isazade NR (2013b) Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 40(5):1675–1689

    Article  Google Scholar 

  • Alotaiby F, Foda S, Alkharashi I (2012) New approaches to automatic headline generation for arabic documents. J Eng Comput Innov 3(1):11–25

    Google Scholar 

  • Azmi AM, Al-Thanyyan S (2012) A text summarizer for arabic. Comput Speech Lang 26(4):260–273

    Article  Google Scholar 

  • Bassiouney R, Katz EG (2012) Arabic language and linguistics. Georgetown University Press, Washington, DC

    Google Scholar 

  • Belguith L, Ellouze M, Maaloul M, Jaoua M, Jaoua F, Blache P (2014) Automatic summarization. In: Zitouni I (ed) Natural language processing of semitic languages, theory and applications of natural language processing. Springer, Berlin, pp 371–408

    Chapter  Google Scholar 

  • Belkebir R, Guessoum A (2015) A supervised approach to arabic text summarization using adaboost. In: Rocha A, Correia AM, Costanzo S, Reis LP (eds) New contributions in information systems and technologies, advances in intelligent systems and computing, vol 353, pp 227–236

  • Binwahlan MS, Salim N, Suanmali L (2010) Fuzzy swarm diversity hybrid model for text summarization. Inf Process Manage 46(5):571–588

    Article  Google Scholar 

  • Boudabous M, Maaloul M, Belguith L (2010) Digital learning for summarizing arabic documents. In: Loftsson H, Rgnvaldsson E, Helgadttir S (eds) Advances in natural language processing, lecture notes in computer science, vol 6233. Springer, Berlin, pp 79–84

  • Boujelben I, Jamoussi S, Hamadou AB (2014) A hybrid method for extracting relations between arabic named entities. J King Saud Univ Comput Inf Sci 26(4):425–440 special Issue on Arabic NLP

    Google Scholar 

  • Buckwalter T, Parkinson D (2011) A frequency dictionary of Arabic: core vocabulary for learners. Routledge

  • Cambria E, White B (2014) Jumping nlp curves: a review of natural language processing research [review article]. Comput Intell Mag IEEE 9(2):48–57

    Article  Google Scholar 

  • Chaibi AH, Naili M, Sammoud S (2014) Topic segmentation for textual document written in arabic language. Procedia Comput Sci 35:437–446

    Article  Google Scholar 

  • Davies M (2010) The corpus of contemporary american english as the first reliable monitor corpus of english. Literary and Linguistic Computing, Oxford

    Google Scholar 

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Douzidia FS, Lapalme G (2004) Lakhas, an arabic summarization system. DUC 2004:128–135

    Google Scholar 

  • Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285

    Article  MATH  Google Scholar 

  • El-Fishawy N, Hamouda A, Attiya GM, Atef M (2013) Arabic summarization in twitter social network. Ain Shams Eng J 5(2):411–420

    Article  Google Scholar 

  • El-Ghannam F, El-Shishtawy T (2013) Multi-topic multi-document summarizer. Int J Comput Sci Inf Technol 5(6):77–90

    Google Scholar 

  • El-Haj M, Hammo B (2008) Evaluation of query-based arabic text summarization system. In: International conference on natural language processing and knowledge engineering, 2008. NLP-KE ’08, pp 1–7

  • El-Haj M, Koulali R (2013) Kalimat a multipurpose arabic corpus. In: The 2nd workshop on Arabic corpus linguistics (WACL-2)

  • El-Haj M, Rayson P (2013) Using a keyness metric for single and multi document summarisation. In: Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 64–71

  • El-Haj M, Kruschwitz U, Fox C (2010) Using mechanical turk to create a corpus of arabic summaries. In: Proceedings of the international conference on language resources and evaluation (LREC), Valletta, Malta, pp 36–39, in the language resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th international language resources and evaluation conference (LREC 2010)

  • El-Haj M, Kruschwitz U, Fox C (2011a) Exploring clustering for multi-document arabic summarisation. In: Salem M, Shaalan K, Oroumchian F, Shakery A, Khelalfa H (eds) Information retrieval technology, lecture notes in computer science, vol 7097. Springer, Berlin, pp 550–561

  • El-Haj M, Kruschwitz U, Fox C (2011b) Multi-document arabic text summarisation. In: Computer science and electronic engineering conference (CEEC), 2011 3rd, IEEE Xplore, pp 40–44

  • El-Haj M, Kruschwitz U, Fox C (2011c) University of essex at the tac 2011 multilingual summarisation pilot. In: Proceedings of the text analysis conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA

  • El-Haj M, Kruschwitz U, Fox C (2014) Creating language resources for under-resourced languages: methodologies, and experiments with arabic. Language Resources and Evaluation, pp 1–32

  • Evans D, McKeown K (2005) Identifying similarities and differences across english and arabic news. In: International conference on intelligence analysis, McLean, VA, May 2005

  • Fabri R, Gasser M, Habash N, Kiraz G, Wintner S (2014) Linguistic introduction: the orthography, morphology and syntax of semitic languages. In: Zitouni I (ed) Natural language processing of semitic languages, theory and applications of natural language processing. Springer, Berlin, pp 3–41

    Chapter  Google Scholar 

  • Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process 8(4):14:1–14:22

    Article  Google Scholar 

  • Fattah MA, Ren F (2009) Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput Speech Lang 23(1):126–144

    Article  Google Scholar 

  • Fejer H, Omar N (2014) Automatic arabic text summarization using clustering and keyphrase extraction. In: 2014 International Conference on information technology and multimedia (ICIMU), pp 293–298

  • Frank E, Wang Y, Inglis S, Holmes G, Witten I (1998) Using model trees for classification. Mach Learn 32(1):63–76

    Article  MATH  Google Scholar 

  • Froud H, Lachkar A, Ouatik SA (2013) Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering. Int J Data Mining Knowl Manag Process (IJDKP) 3(1). doi:10.5121/ijdkp.2013.3107

  • Giannakopoulos G (2013) Multi-document multilingual summarization and evaluation tracks in acl 2013 multiling workshop. In: Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 20–28

  • Giannakopoulos G, Karkaletsis V (2011) Autosummeng and memog in evaluating guided summaries. In: Proceedings of the text analysis conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA

  • Giannakopoulos G, Karkaletsis V (2013) Summary evaluation: together we stand npower-ed. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, lecture notes in computer science, vol 7817. Springer, Berlin, pp 436–450

  • Giannakopoulos G, El-Haj M, Favre B, Litvak M, Steinberger J, Varma V (2012) Tac 2011 multiling pilot overview. In: Text analysis conference (TAC)

  • Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, SIGIR ’01, pp 19–25

  • Haboush A, Momani A, Al-Zoubi M, Tarazi M (2012) Arabic text summerization model using clustering techniques. World Comput Sci Inf Technol J 2(2):62–67

    Google Scholar 

  • Ibrahim A, Elghazaly T (2013a) Improve the automatic summarization of arabic text depending on rhetorical structure theory. In: 2013 12th Mexican international conference on artificial intelligence (MICAI), pp 223–227

  • Ibrahim A, Elghazaly T (2013b) Rhetorical representation and vector representation in summarizing arabic text. Natural language processing and information systems, lecture notes in computer science, vol 7934. Springer, Berlin, pp 421–424

  • Imam I, Hamouda A, Khalek HAA (2013) An ontology-based summarization system for arabic documents (ossad). Int J Comput Appl 74(17):38–43

    Google Scholar 

  • Inouye D, Kalita J (2011) Comparing twitter summarization algorithms for multiple post summaries. In: Privacy, security, risk and trust (passat), 2011 IEEE third international conference on and 2011 IEEE 3rd international conference on social computing (socialcom), pp 298–306

  • Ismail S, Moawd I, Aref M (2013) Arabic text representation using rich semantic graph: A case study. In: Proceedings of the 4th European conference of computer science (ECCS ’13), pp 148–153

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  • Jones KS, Galliers JR (1996) Evaluating natural language processing systems: an analysis and review. Springer, Secaucus, NJ, USA

    Google Scholar 

  • Keskes I, Boudabous MM, Maaloul MH, Hadrich Belguith L (2012) Étude comparative entre trois approches de résumé automatique de documents arabes (comparative study of three approaches to automatic summarization of arabic documents) [in french]. Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, vol 2: TALN. ATALA/AFCP, Grenoble, France, pp 225–238

  • Li L, Forascu C, El-Haj M, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 1: Arabic, english, greek, chinese, romanian. Proceedings of the MultiLing 2013 workshop on multilingual multi-document summarization. Association for Computational Linguistics, Sofia, Bulgaria, pp 1–12

  • Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Marie-Francine Moens SS (ed) Text summarization branches out: proceedings of the ACL-04 workshop. Association for Computational Linguistics, Barcelona, Spain, pp 74–81

  • Lin CY, Hovy E (2002) Manual and automatic evaluation of summaries. In: Proceedings of the ACL-02 workshop on automatic summarization, vol 4. Association for Computational Linguistics, Stroudsburg, PA, USA, AS ’02, pp 45–51

  • Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41

    Article  Google Scholar 

  • Lloret E, Palomar M (2013) Tackling redundancy in text summarization through different levels of language analysis. Comput Stand Interfaces 35(5):507–518

    Article  Google Scholar 

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  MathSciNet  Google Scholar 

  • Mani I (2001) Automatic summarization, vol 3. John Benjamins Publishing, Amsterdam

    Book  MATH  Google Scholar 

  • McKeown K, Radev DR (1995) Generating summaries of multiple news articles. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, SIGIR ’95, pp 74–82

  • Mehdad Y, Magnini B (2009) Optimizing textual entailment recognition using particle swarm optimization. In: Proceedings of the 2009 workshop on applied textual inference. Association for Computational Linguistics, pp 36–43

  • Mima H, Ananiadou S (2000) An application and e aluation of the c/nc-value approach for the automatic term recognition of multi-word units in japanese. Int J Terminol 6(2):175–194

    Article  Google Scholar 

  • Nenkova A (2006) Summarization evaluation for text and speech: issues and approaches. In: INTERSPEECH

  • Nenkova A, Passonneau R (2004) Evaluating content selection in summarization: The pyramid method. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings. Association for Computational Linguistics, Boston, Massachusetts, USA, pp 145–152

  • Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427

    Article  Google Scholar 

  • Oufaida H, Nouali O, Blache P (2014) Minimum redundancy and maximum relevance for single and multi-document arabic text summarization. J King Saud Univ Comput Inf Sci 26(4):450–461 special Issue on Arabic NLP

    Google Scholar 

  • Over P, Dang H, Harman D (2007) Duc in context. Inf Process Manag 43(6):1506–1520

    Article  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  • Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—a platform for multidocument multilingual text summarization. In: Conference on language resources and evaluation (LREC), Lisbon, Portugal

  • Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28(4):399–408

    Article  Google Scholar 

  • Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: Proceedings of the workshop on comparing corpora, vol 9. Association for Computational Linguistics, Stroudsburg, PA, USA, WCC ’00, pp 1–6

  • Ryding K (2005) A reference grammar of modern standard Arabic. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Saggion H, Poibeau T (2013) Automatic text summarization: Past, present and future. In: Poibeau T, Saggion H, Piskorski J, Yangarber R (eds) Multi-source, multilingual information extraction and summarization, theory and applications of natural language processing. Springer, Berlin, pp 3–21

    Chapter  Google Scholar 

  • Schlesinger J, OLeary D, Conroy J (2008) Arabic /english multi-document summarization with classy-the past and the future. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, lecture notes in computer science, vol 4919. Springer, Berlin, pp 568–581

  • Shaheen M, Ezzeldin A (2014) Arabic question answering: systems, resources, tools, and future trends. Arabian J Sci Eng 39(6):4541–4564

    Article  Google Scholar 

  • Sobh I, Darwish N, Fayek M (2007) An optimized dual classification system for arabic extractive generic text summarization. In: Proceeding of the 7th conference on language engineering

  • Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manage 43(6):1606–1618

    Article  Google Scholar 

  • Wu JW, Tseng J, Tsai WN (2010) A discrete particle swarm optimization algorithm for domain independent linear text segmentation. In: IEEE international conference on Granular computing (GrC), 2010, pp 519–524

  • Yih Wt, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’07, pp 1776–1782

Download references

Acknowledgments

This work was supported by the Research Center of College of Computer and Information Sciences, King Saud University. The authors are grateful for this support. Also, the authors would like to thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asma Bader Al-Saleh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Saleh, A.B., Menai, M.E.B. Automatic Arabic text summarization: a survey. Artif Intell Rev 45, 203–234 (2016). https://doi.org/10.1007/s10462-015-9442-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-015-9442-x

Keywords

Navigation