Hybrid method for text summarization based on statistical and semantic treatment

Alami, Nabil; Mallahi, Mostafa El; Amakdouf, Hicham; Qjidaa, Hassan

doi:10.1007/s11042-021-10613-9

Hybrid method for text summarization based on statistical and semantic treatment

Published: 28 February 2021

Volume 80, pages 19567–19600, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Nabil Alami¹,
Mostafa El Mallahi ORCID: orcid.org/0000-0001-9735-6799²,
Hicham Amakdouf¹ &
…
Hassan Qjidaa¹

1194 Accesses
Explore all metrics

Abstract

Text summarization presents several challenges such as considering semantic relationships among words, dealing with redundancy and information diversity issues. Seeking to overcome these problems, we propose in this paper a new graph-based Arabic summarization system that combines statistical and semantic analysis. The proposed approach utilizes ontology hierarchical structure and relations to provide a more accurate similarity measurement between terms in order to improve the quality of the summary. The proposed method is based on a two-dimensional graph model that makes uses statistical and semantic similarities. The statistical similarity is based on the content overlap between two sentences, while the semantic similarity is computed using the semantic information extracted from a lexical database whose use enables our system to apply reasoning by measuring semantic distance between real human concepts. The weighted ranking algorithm PageRank is performed on the graph to produce significant score for all document sentences. The score of each sentence is performed by adding other statistical features. In addition, we address redundancy and information diversity issues by using an adapted version of Maximal Marginal Relevance method. Experimental results on EASC and our own datasets showed the effectiveness of our proposed approach over existing summarization systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Statistical and Semantic Analysis for Arabic Text Summarization

Automatic Arabic Text Summarization Using Analogical Proportions

Article 19 August 2020

Abstractive Text Summarization based on Improved Semantic Graph Approach

Article 02 February 2018

References

Afsharizadeh M, Ebrahimpour-Komleh H, Bagheri A (2018) Query-oriented text summarization using sentence extraction technique. 2018 4th International Conference on Web Research (ICWR). https://doi.org/10.1109/icwr.2018.8387248
Alami N, Meknassi M, Alaoui Ouatik S, Ennahnahi N (2015) Arabic text summarization based on graph theory. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), Marrakech, pp 1–8. https://doi.org/10.1109/aiccsa.2015.7507254
Alami N, En-nahnahi N, Ouatik SA, Meknassi M (2018) Using unsupervised deep learning for automatic summarization of Arabic documents. Arab J Sci Eng 43(12):7803–7815
Article Google Scholar
Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning. Expert Syst Appl 123:195–211
Article Google Scholar
Alguliyev RM, Aliguliyev RM, Isazade NR (2015) An unsupervised approach to generating generic summaries of documents. Appl Soft Comput 34:236–250
Article Google Scholar
Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for Arabic text summarization using domain knowledge and genetic algorithms. Cogn Comput 10(4):651–669
Article Google Scholar
Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM : discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109
Article MathSciNet Google Scholar
Baruah N, Sarma SK, Borkotokey S (2019) A novel approach of text summarization using Assamese WordNet. 2019 4th international conference on information systems and computer networks (ISCON). https://doi.org/10.1109/iscon47742.2019.9036285
Boudchiche M, Mazroui A, Ould Abdallahi Ould Bebah M, Lakhouaja A, Boudlal A (2017) AlKhalil Morpho sys 2: a robust Arabic morpho-syntactic analyzer. Journal of King Saud University - Computer and Information Sciences 29(2):141–146
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1):107–117
Article Google Scholar
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based re-ranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998. Melbourne, Australia, pp 335–336
Google Scholar
Chennoufi A, Mazroui A (2017) Morphological, syntactic and diacritics rules for automatic diacritization of Arabic sentences. Journal of King Saud University - Computer and Information Sciences 29(2):156–163
Article Google Scholar
Dhungana UR, Shakya S, Baral K, Sharma B (2015) Word sense disambiguation using WSD specific WordNet of polysemy words. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015). Anaheim, CA, pp 148–152. https://doi.org/10.1109/ICOSC.2015.7050794
Chapter Google Scholar
Douzidia FS, Lapalme G (2004) Lakhas, an Arabic summarization system. In: Proc. of 2004 Doc. Understanding Conf. (DUC2004), Boston, MA
Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
Article MATH Google Scholar
Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified PageRank algorithm. Egyptian Informatics Journal 21(2):73–81
Article Google Scholar
Elberrichi Z, Abidi K (2012) Arabic text categorization: a comparative study of different representation modes. The International Arab Journal of Information Technology 9:465–470
Google Scholar
El-Fishawy N, Hamouda A, Attiya GM, Atef M (2014) Arabic summarization in twitter social network. Ain Shams Engineering Journal 5(2):411–420
Article Google Scholar
El-Haj M, Kruschwitz U, Fox C (2010) Using mechanical turk to create a corpus of arabic summaries. In: proceedings of the 7th international conference on language resources and evaluation (LREC), Valletta, Malta, pp 36–39, in the language resources (LRs) and human language technologies (HLT) for Semitic languages workshop.
El-Haj M, Kruschwitz U, Fox C (2011) Experimenting with automatic text summarisation for Arabic. In: Vetulani Z (ed) Human language technology. Challenges for Computer Science and Linguistics, Springer, Berlin Heidelberg, pp 490–499
Google Scholar
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264
Article Google Scholar
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl 165:113679
Article Google Scholar
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Estiri A, Kahani M, Ghaemi H, Abasi M (2014) Improvement of an abstractive summarization evaluation tool using lexical-semantic relations and weighted syntax tags in Farsi language. In: 2014 Iranian Conference on Intelligent Systems (ICIS). Bam 2014:1–6. https://doi.org/10.1109/iraniancis.2014.6802594
Article Google Scholar
Fang H, Lu W, Wu F, Zhang Y, Shang X, Shao J, Zhuang Y (2015) Topic aspect-oriented summarization via group selection. Neurocomputing 149:1613–1619
Article Google Scholar
Fattah MA (2014) A hybrid machine learning model for multi-document summarization. Appl Intell 40(4):592–600
Article Google Scholar
Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 23(1):126–144
Article Google Scholar
Ferreira R, de Souza CL, Freitas F, Lins RD, de Frana SG, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787
Article Google Scholar
Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88
Article Google Scholar
Gao Z, Xu C, Zhang H, Li S, de Albuquerque VHC (2020) Trustful internet of surveillance things based on deeply represented visual co-saliency detection. IEEE Internet Things J 7(5):4092–4100
Article Google Scholar
Gao Z, Zhang H, Dong S, Sun S, Wang X, Yang G, Wu W, Li S, de Albuquerque VHC (2020) Salient object detection in the distributed cloud-edge intelligent network. IEEE Netw 34(2):216–224
Article Google Scholar
Habash NY (2010) Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies 3:1–187
Article Google Scholar
Heu JU, Qasim I, Lee DH (2015) FoDoSu: multi-document summarization exploiting semantic analysis based on social folksonomy. Inf Process Manag 51(1):212–225
Article Google Scholar
Hovy EH (2005) Automated text summarization. In: Mitkov R (ed) The Oxford handbook of computational linguistics. Oxford Univ, Press, pp 583–598
Google Scholar
Ibrahim A, Elghazaly T (2013) Rhetorical representation and vector representation in summarizing arabic text. Natural language processing and information systems, lecture notes in computer science, vol 7934 pp 421–424. Springer, Berlin
Kang B, Nguyen TQ (2019) Random Forest with learned representations for semantic segmentation. IEEE Trans Image Process 28(7):3542–3555
Article MathSciNet MATH Google Scholar
Khoja S (1999) Stemming Arabic Text. http://zeus.cs.pacificu.edu/shereen/research.htm
Khoja S (2001) APT: Arabic part-of-speech tagger. In: Proceedings of the student workshop at the second meeting of the north American chapter of the Association for Computational Linguistics (NAACL2001). Carnegie Mellon University, Pittsburgh, Pennsylvania, pp 20–25
Google Scholar
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. Proceedings of workshop on text summarization branches out, post-conference workshop of ACL, In, pp 74–81
Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Malik R, Subramaniam V, Kaushik S (2007) Automatically selecting answer templates to respond to customer emails. In: Proceedings of the 20th international joint conference on Artifical intelligence. Hyderabad, India, pp 1659–1664
Google Scholar
Mani I, Maybury MT (1999) Advances in automatic summarization. MIT Press, Cambridege, MA
Google Scholar
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing 2004. Barcelona, Spain, pp 404–411
Google Scholar
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 56(4):1356–1372
Article Google Scholar
Nguyen-Hoang TA, Nguyen K, Tran QV (2012) TSGVi: a graph-based summarization system for Vietnamese documents. J Ambient Intell Human Comput 3:305–313
Article Google Scholar
Oufaida H, Nouali O, Blache P (2014) Minimum redundancy and maximum relevance for single and multidocument arabic text summarization. Journal of King Saud University - Computer and Information Sciences 26(4):450–461
Article Google Scholar
Pal AR, Saha D (2014) An approach to automatic text summarization using WordNet. In: 2014 IEEE International Advance Computing Conference (IACC), Gurgaon, pp 1169-1173. https://doi.org/10.1109/iadcc.2014.6779492
Patel D, Shah S, Chhinkaniwala H (2019) Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique. Expert Syst Appl 134:167–177
Article Google Scholar
Patil AP, Dalmia S, Abu Ayub Ansari S, Aul T, Bhatnagar V (2014) Automatic text summarizer. In: In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), New Delhi, pp 1530–1534. https://doi.org/10.1109/ICACCI.2014.6968629
Chapter Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80:3275–3305. https://doi.org/10.1007/s11042-020-09549-3
Article Google Scholar
Rinaldi AM, Russo C (2020) Using a multimedia semantic graph for web document visualization and summarization. Multimed Tools Appl 80:3885–3925. https://doi.org/10.1007/s11042-020-09761-1
Article Google Scholar
Shaheen M, Ezzeldin AM (2014) Arabic question answering: systems, resources, tools, and future trends. Arab J Sci Eng 39(6):4541–4564
Article Google Scholar
Song S, Huang H, Ruan T (2018) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875
Article Google Scholar
Wei TT, Lu YH, Chang HY, Zhou Q, Bao XY (2015) A semantic approach for text clustering using WordNet and lexical chains. Expert Syst Appl 42(4):2264–2275
Article Google Scholar
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. https://doi.org/10.3115/981732.981751
Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23
Article Google Scholar
Yang K, He H, Al-Sabahi K, Zhang Z (2019) EcForest: extractive document summarization through enhanced sentence embedding and cascade forest. Concurrency and Computation: Practice and Experience 31:e5206. https://doi.org/10.1002/cpe.5206
Article Google Scholar
Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105
Article Google Scholar

Download references

Author information

Authors and Affiliations

LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, PO Box 1796, Fez, 30003, Morocco
Nabil Alami, Hicham Amakdouf & Hassan Qjidaa
High Normal School, Mathematics and computer sciences Department, Laboratory of Computer Science and Interdisciplinary Physics, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Mostafa El Mallahi

Authors

Nabil Alami
View author publications
You can also search for this author inPubMed Google Scholar
Mostafa El Mallahi
View author publications
You can also search for this author inPubMed Google Scholar
Hicham Amakdouf
View author publications
You can also search for this author inPubMed Google Scholar
Hassan Qjidaa
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nabil Alami.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alami, N., Mallahi, M.E., Amakdouf, H. et al. Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80, 19567–19600 (2021). https://doi.org/10.1007/s11042-021-10613-9

Download citation

Received: 25 June 2020
Revised: 16 October 2020
Accepted: 25 January 2021
Published: 28 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-021-10613-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid method for text summarization based on statistical and semantic treatment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Using Statistical and Semantic Analysis for Arabic Text Summarization

Automatic Arabic Text Summarization Using Analogical Proportions

Abstractive Text Summarization based on Improved Semantic Graph Approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now