Abstract
This chapter addresses automatic summarization of Semitic languages. After a presentation of the theoretical background and current challenges of automatic summarization, we present different approaches suggested to cope with these challenges. The main approaches dealing with Semitic languages (mainly Arabic, Hebrew, Maltese and Amharic) are then discussed. Finally, a case study of a specific Arabic automatic summarization system is presented.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afantenos SD (2008) Summarizing reports on evolving events – part ii: non-linear evolution. In: Bernadette SE, Zock M (eds) Proceedings of the 5th international workshop on natural language processing and cognitive science (NLPCS 2008), Barcelona, pp 3–12
Alemany LA, Castellón I, Climent S, Fort MF, Padró L, Rodríguez H (2004) Approaches to text summarization: questions and answers. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 8(22):79–102
Alrahabi M, Mourad G, Djioua B (2004) Filtrage sémantique de textes en arabe en vue d’un prototype de résumé automatique. In: dans les actes de la conference JEP/TALN’04, Fès
Alrahabi M, Djioua B, Desclés JP (2006) Annotation sémantique des énonciations en arabe. In: INFORSID’2006, Hammamet
AlSanie W (2005) Towards an infrastructure for Arabic text summarization using rhetorical structure theory. Master thesis in computer science, King Saud University, Riyadh
Amini MR, Usunier N (2009) Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR’09, Boston. ACM, pp 704–705
Amini M, Tombros A, Usunier N, Lalmas M (2007) Learning-based summarisation of XML documents. Inf Retr 10(3):233–255
Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: AND, Singapore, pp 91–97
Barzilay R, Elhadad M (1997) Using lexical chains for text summarization. In: Proceedings of the ACL/EACL 1997 workshop on intelligent scalable text summarization, Madrid, pp 10–17
Barzilay R, Lapata M (2005) Collective content selection for concept-to-text generation. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT’05, Vancouver. Association for Computational Linguistics, pp 331–338. http://dx.doi.org/10.3115/1220575.1220617
Belguith LH, Chaaben N (2004) Implémentation du système morph2 d’analyse morphologique pour l’arabe non voyellé. In: Quatrièmes journées scientifiques des jeunes chercheurs en Génie Electrique et Informatique (GEI’2004), Monastir
Belguith LH, Baccour L, Ghassan M (2005) Segmentation de textes arabes basee sur l’analyse contextuelle des signes de ponctuations et de certaines particules. Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles TALN’2005, Dourdan, vol 1, pp 451–456
Belguith LH, Aloulou C, Ben Hamadou A (2007) Maspar: De la segmentation à l’analyse syntaxique de textes arabes. In: CEPADUES-Editions (ed) Revue information interaction intelligence I3, vol 2, pp 9–36. ISSN:1630-649x, http://www.revue-i3.org/
Biadsy F, Hirschberg J, Filatova E (2008) An unsupervised approach to biography production using wikipedia. In: Association for Computational Linguistics, Columbus, pp 807–815
Blair-Goldensohn S, Evans D, Hatzivassiloglou V, Mckeown K, Nenkova A, Passonneau R, Schiffman B, Schlaikjer A, Siddharthan A, Siegelman S (2004) Columbia University at DUC 2004. In: Proceedings of the document understanding conference, Boston, pp 23–30
Boudabbous MM, Maaloul MH, Belguith LH (2010) Digital learning for summarizing Arabic documents. In: Proceeding of the 7 th international conference on natural language processing, IceTAL’10, Reykjavik
Boudabbous MM, Keskes I, Maaloul MH, Belguith LH (2011) Automatic summarization of Arabic texts. In: 7th international computing conference in Arabic 2011 (ICCA2011), Riadh
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Research and development in information retrieval, Melbourne. Association for Computing Machinery, New York, pp 335–336
Celikyilmaz A, Hakkani-Tur D (2010) A hybrid hierarchical model for multi-document summarization. In: ACL, Uppsala, pp 815–824
Chaaben N, Belguith LH, Ben Hamadou A (2010) The morph2 new version: a robust morphological analyzer for Arabic texts. In: Actes des 10emes journees internationales d’analyse statistique des données JADT’2010, Rome. http://jadt2010.uniroma1.it/
Conroy JM, Goldstein J, Schlesinger JD, O’leary DP (2004) Left-brain/right-brain multi-document summarization. In: Proceedings of the document understanding conference DUC’04, Boston
Conroy JM, Schlesinger JD, Kubina J (2011) CLASSY 2011 at TAC: guided and multi-lingual summaries and evaluation metrics. In: Proceedings of TAC’11, Gaithersburg
Dang HT, Owczarzak K (2009) Overview of TAC 2009 summarization track. In: Proceedings of the second text analysis conference, Gaithersburg
Daumé H III, Echihabi A, Marcu D, Munteanu DS, Soricut R (2002) GLEANS: a generator of logical extracts and abstracts for nice summaries. In: Proceedings of the second document understanding conference (DUC), Philadelphia, pp 9–14
Desclés JP (1997) Systèmes d’exploration contextuelle. In: Co-texte et calcul du sens – (Claude Guimier). Presses universitaires de Caen, pp 215–232
Desclés J-P, Minel J-L (2005) Interpréter par exploration contextuelle. In: Corblin F, Gardent C (eds) Interpréter en contexte. Hermès, Paris, pp 305–328
Douzidia F, Lapalme G (2004) Lakhas, an Arabic summarization system. In: Proceedings of DUC’04, NIST, Boston, pp 128–135
Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: a system for querying, clustering and summarizing documents. Inf Process Manag 43(6)1588–1605. Text summarization
Edmundson HP (1969) New methods in automatic extracting. J Assoc Comput Mach 16(2):264–285
El-Haj M, Hammo B (2008) Evaluation of query-based Arabic text summarization system. In: Proceeding of the IEEE international conference on natural language processing and knowledge engineering, Beijing. IEEE Computer Society, pp 1–7
Ellouze M (2004) Des schémas rhétoriques pour le contrôle de la cohérence et génération de résumés automatiques d’articles scientifiques. Thèse de doctorat, Ecole Nationale des sciences de l’Informatique, Université de Manouba, Tunis
Ercan G (2006) Automated text summarization and keyphrase extraction. Phd thesis, Bilkent University
Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: EMNLP, Barcelona
Filatova E, Hatzivassiloglou V (2003) Domain-independent detection, extraction, and labeling of atomic events. In: Proceedings of the RANLP’03 conference, Borovetz
Fuentes M, Massot M, Rodríguez H, Alonso L (2003) Headline extraction combining statistic and symbolic techniques. In: DUC03, Edmonton. Association for Computational Linguistics
Gamback B, Asker L (2010) Experiences with developing language processing tools and corpora for Amharic. In: Cunningham P, Cunningham M (eds) Proceedings of IST-Africa 2010, the 5th conference on regional: impact of information society technologies in Africa, Durban. http://www.sics.se/~gamback/publications/istafrica10.pdf
Giannakopoulos G, Karkaletsis V, Vouros G, Stamatopoulos P (2008) Summarization system evaluation revisited: N-gram graphs. ACM Trans Speech Lang Process 5(3):1–5
Goldstein J, Mittal V, Carbonell J, Callan J (2000) Creating and evaluating multi-document sentence extract summaries. In: Proceedings of the ninth international conference on informationand knowledge management, McLean. ACM, New York, pp 165–172
HaCohen-Kerner Y, Malin E, Chasson I (2003) Summarization of Jewish law articles in Hebrew. In: Nygard KE (ed) Proceedings of the 16th international conference on computer applications in industry and engineering, ISCA, Imperial Palace Hotel, Las Vegas, pp 172–177
Hahn U (1998) Automatic extracting – a poor man’s approach to automatic abstracting. In: International workshop on extraction, filtering and automatic summarization (RIFRA’98), Sfax
Hahn U, Mani I (2000) The challenges of automatic summarization. Computer 33(11)29–36. http://dx.doi.org/10.1109/2.881692
Harabagiu SM, Lacatusu VF, Maiorano SJ (2003) Multi-document summaries based on semantic redundancy. In: FLAIRS conference, St. Augustine, pp 387–391
Hatzivassiloglou V, Klavans J, Holcombe M, Barzilay R, Kan M, Mckeown K (2001) SIMFINDER: a flexible clustering tool for summarization. In: Proceedings of the NAACL workshop on automatic summarization, Pittsburgh, pp 41–49
Hmida F, Favre B (2011) LIF at TAC multiling: towards a truly language independent summarizer. In: Proceedings of TAC’11, Gaithersburg
Hovy E (1999) Cross-lingual information extraction and automated text summarization. In: Multilingual information management: current levels and future abilities, chap 3. Istituti editoriali e poligrafici internazionali, Pisa
Hovy E, Lin CY (1999) Automated text summarization in summarist. In: Mani I, Maybury MT (eds) Advances in automatic text summarization. MIT, Cambridge
Hovy E, Marcu D (1998) Automated text summarization tutorial. In: COLING/ACL’98, Montreal
Hovy E, yew Lin C, Zhou L, Fukumoto J (2006) Automated summarization evaluation with basic elements. In: Proceedings of the fifth conference on language resources and evaluation (LREC’06), Genoa
Jagarlamudi J, Pingali P, Varma V (2007) Capturing sentence prior for query-based multi-document summarization. In: RIAO, Pittsburgh
Jaoua M, Hamadou AB (2003) Automatic text summarization of scientific articles based on classification of extract’s population. In: Proceedings of the 4th international conference on computational linguistics and intelligent text processing, CICLing’03, Mexico City. Springer, Berlin/Heidelberg, pp 623–634
Jaoua FK, Belguithand LH, Jaoua M, BenHamadou A (2009) An automatic multi-documents summarization method based on extracts classification. Int J Comput Sci Eng Syst (IJCSES) 3:221–231
Jones KS (1999) Automatic summarising: factors and directions. In: Advances in automatic text summarization. MIT, Cambridge, pp 1–12
Keskes I, Boudabous MM, Maaloul MH, Belguith LH (2012) Etude comparative entre trois approches de résume automatique de documents arabes. In: Actes de la conférence conjointe JEP-TALN-RECITAL’2012: TALN, Grenoble
Ku LW, Liang YT, Chen HH (2006) Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI spring symposium: computational approaches to analyzing weblogs, Stanford, pp 100–107
Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’95, Seattle. ACM, New York, pp 68–73. http://doi.acm.org/10.1145/215206.215333
Lacatusu VF, Maiorano SJ, Harabagiu SM (2004) Multi-document summarization using multiple-sequence alignment. In: LREC, Lisbon
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Proceedings of ACL workshop on text summarization branches out, Barcelona, p 10
Lin CY, Hovy E (1997) Identifying topics by position. In: Proceedings of the fifth conference on applied natural language processing, Washington, DC. Morgan Kaufmann, San Francisco, pp 283–290. http://dx.doi.org/10.3115/974557.974599
Lin CY, Hovy E (2002) Automated multi-document summarization in neats. In: Proceedings of the second international conference on human language technology research, HLT’02, San Diego. Morgan Kaufmann, San Francisco, pp 59–62
Lin CY, Hovy E (2003) The potential and limitations of automatic sentence extraction for summarization. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, HLT-NAACL-DUC’03, Edmonton, vol 5. Association for Computational Linguistics, Stroudsburg, pp 73–80
Litvak M, Lipman H, Ben-Gur A, Kisilevich S, Keim DA, Last M (2010) Towards multi-lingual summarization: a comparative analysis of sentence extraction methods on English and Hebrew corpora. In: Proceedings of the 4th international workshop on cross lingual information access, Beijing, pp 61–69. http://bib.dbvis.de/uploadedFiles/219.pdf
Litvak M, Last M, Friedman M, Kisilevich S (2011) MUSE – a multilingual sentence extractor. In: Computational linguistics & applications (CLA 11), Jachranka. http://bib.dbvis.de/uploadedFiles/362.pdf
Luhn H (1958) The automatic creation of literature abstracts. IBM J 2:159–165
Maaloul MH, Keskes I, Belguith LH (2010) Résume automatique de documents arabes basé sur la technique RST. In: Actes de TALN 2010, Montréal
Maaloul MH, Khemakhem ME, Belguith LH (2008) Al lakas el’eli: un systeme de resume automatique de documents arabes. In: International Business Information Management Association (IBIMA’2008), Marrakesh
Maaloul MH, Keskes I, Belguith LH, Blache P (2010) Automatic summarization of Arabic texts based on rst technique. In: International conference on enterprise information systems (ICEIS) 2, Funchal
Maaloul MH, Ajjel W, Belguith LH (2012) Role of linguistic analysis in detecting rhetorical relations. In: International conference on Arabic language processing, CITALA’2012, Rabat
Mani I (2001) Automatic summarization. John Benjamins, Amsterdam/Philadelphia
Mani I, Bloedorn E (1999) Summarizing similarities and differences among related documents. Inf Retr 1(1–2):35–67
Mani I, Maybury MT (2001) Automatic summarization. In: Association for Computational Linguistics, Toulouse
Mann WC, Thompson SA (1988) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3):243–281
Marcu D (2000) The theory and practice of discourse parsing and summarization. MIT, Cambridge
Marcu D, Carlson L, Watanabe M (2000) The automatic translation of discourse structures. In: ANLP, Seattle, pp 9–17
Mathkour HI, Touir AA, Al-Sanea WA (2008) Parsing Arabic texts using rhetorical structure theory. J Comput Sci 4(9):713–720
Maybury MT (ed) (1999) Advances in automatic text summarization. MIT, Cambridge
McKeown KR, Klavans JL, Hatzivassiloglou V, Barzilay R, Eskin E (1999) Towards multidocument summarization by reformulation: progress and prospects. In: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh innovative applications of artificial intelligence conference innovative applications of artificial intelligence, AAAI’99/IAAI’99, Orlando. American Association for Artificial Intelligence, pp 453–460
Melli G, Shi Z, Wang Y, Liu Y, Sarkar A, Popowich F (2006) Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. In: Proceedings of the document understanding conference 2006 (DUC’2006), New York City
Minel JL (2002) Filtrage sémantique: du résume automatique à la fouille de textes. Hermes Science, Paris
Minel JL, Descles JP, Cartier E, Crispino G, Ben Hazez S, Jackiewicz A (2009) Resume automatique par filtrage semantique d’informations dans des textes. Revue Techniques et Sciences Informatiques
Mori T, Nozawa M, Asada Y (2004) Multi-answer-focused multi-document summarization using a question-answering engine. In: Proceedings of the 20th international conference on computational linguistics, COLING’04, Geneva. Association for Computational Linguistics
Nenkova A, Passonneau R (2004) Evaluating content selection in summarization: the pyramid method. In: Human language technologies: conference of the North American chapter of the Association of Computational Linguistics HLT/NAACL, Boston, pp 145–152
Nobata C, Sekine S (2004) CRL/NYU summarization system at DUC-2004. In: DUC’2004, Boston
Ono K, Sumlta K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of COLING, Kyoto, pp 344–348
Ou S, Khoo CSG, Goh DH (2008) Design and development of a concept-based multi-document summarization system for research abstracts. J Inf Sci 34(3):308–326
Paice CD (1990) Constructing literature abstracts by computer: techniques and prospects. Inf Process Manag 26(1)171–186. Special issue: Natural Language Processing and Information Retrieval
Paice CD, Jones PA (1993) A ‘select and generate’ approach in automatic abstracting. In: Mcenery T, Paice CD (eds) 14th information retrieval colloquium, Lancaster. Springer
Radev DR (2000) A common theory of information fusion from multiple text sources step one: cross-document structure. In: Proceedings of the 1st SIGdial workshop on discourse and dialogue, SIGDIAL’00, Hong Kong, vol 10. Association for Computational Linguistics, pp 74–83
Radev DR (2001) Experiments in single and multidocument summarization using mead. In: First document understanding conference, New Orleans
Radev DR, McKeown KR (1998) Generating natural language summaries from multiple on-line sources. Comput Linguist 24(3):470–500. http://dl.acm.org/citation.cfm?id=972749.972755
Roussarie L, Amsili P (2002) Discours et compositionnalite. In: Actes de la 9eme Conference sur le Traitement Automatique des Langues Naturelles (TALN 2002), Nancy, vol 1, pp 383–388. http://talana.linguist.jussieu.fr/~laurent/Papiers/Taln2002.ps.gz
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(3):193–207. http://dx.doi.org/10.1016/S0306-4573(96)00062-3
Schlesinger JD, O’Leary DP, Conroy JM (2008) Arabic/English multi-document summarization with classy – the past and the future. In: Gelbukh AF (ed) CICLing, Haifa. Lecture notes in computer science, vol 4919. Springer, pp 568–581
Sekine S, Nobata C (2003) A survey for multi-document summarization. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, HLT-NAACL-DUC’03, Edmonton, vol 5. Association for Computational Linguistics, Stroudsburg, pp 65–72. http://dx.doi.org/10.3115/1119467.1119476
Sitbon L (2007) Robustesse en recherche d’information: application a l’accessibilite aux personnes handicapees. PhD thesis, Universite d’Avignon
Sobh I, Darwish N, Fayek M (2007) An optimized dual classification system for Arabic extractive generic text summarization. In: Proceedings of the 7th conference on language engineering, ESLEC’07, Cairo
Steinberger J, Kabadjov M, Steinberger R, Tanev H, Turchi M, Vanni Z (2011) JRC’s participation at TAC 2011: guided and multilingual summarization tasks. In: Proceedings of TAC’11, Gaithersburg
Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the workshop on intelligent scalable text summarization at the ACL/EACL conference, Madrid, pp 58–65
Tratz S, Hovy E (2008) Summarisation evaluation using transformed basic elements. In: Proceedings TAC 2008, Gaithersburg, NIST, p 10p
Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retr 2(4):303–336
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manag 43(6):1606–1618
Vella G (2010) Automatic summarization of legal documents. Technical report, Master’s thesis. Deptartment CSAI, University of Malta
White M, Korelsky T, Cardie C, Ng V, Pierce D, Wagstaff K (2001) Multi-document Summarization via information extraction. In: Proceedings first international conference on human language technology research, San Diego, pp 263–269. http://acl.ldc.upenn.edu/H/H01/H01-1054.pdf
Yeh JY, Ke HR, Yang WP (2006) Query-focused multidocument summarization based on hybrid relevance analysis and surface feature salience. In: Proceedings of the 6th WSEAS international conference on simulation, modelling and optimization, SMO’06, Lisbon. World Scientific and Engineering Academy and Society (WSEAS), pp 464–469
Zhou L, Ticrea M, Hovy EH (2004) Multi-document biography summarization. In: EMNLP, Barcelona, pp 434–441
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Belguith, L.H., Ellouze, M., Maaloul, M.H., Jaoua, M., Jaoua, F.K., Blache, P. (2014). Automatic Summarization. In: Zitouni, I. (eds) Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45358-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-45358-8_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45357-1
Online ISBN: 978-3-642-45358-8
eBook Packages: Computer ScienceComputer Science (R0)