research-article

Method for Arabic text Summarization using statistical features and word2vector approach

Authors:

Mohamad Al-ShaarAuthors Info & Claims

ICCTA '23: Proceedings of the 2023 9th International Conference on Computer Technology Applications

Pages 258 - 262

https://doi.org/10.1145/3605423.3605444

Published: 20 August 2023 Publication History

Abstract

Arabic text summarization is a field of natural language processing, and many algorithms for Arabic texts summarization have been developed, but these algorithms face weakness points which generated from the specifications of the Arabic language manipulation.

In this research we have developed new algorithm for Arabic text summarization that the our algorithm contains two main models of summarization, the statistical model and the semantic model,that the statistical model was employed to generate the candidate sentences for summary, and the semantic model was used to select and generate the final summary, in the statistical model we have used sentences statistical features, and in semantic model we have used word2vector technology which convert words to number and keeping the context of text, the developed algorithm has been tested on The Essex Arabic Summaries Corpus (EASC) dataset, and the tests approved the efficiency and the accuracy of the developed algorithm, that the developed algorithm summarization Precision reached to 75%.

References

[1]

Shen, D. (2009). Text Summarization. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_424

[2]

Rupal Bhargava, Yashvardhan Sharma, Gargi Sharma, ATSSI: Abstractive Text Summarization Using Sentiment Infusion, Procedia Computer Science,Volume 89,2016,Pages 404-411,ISSN 1877-0509, https://doi.org/10.1016/j.procs.2016.06.088.

[3]

M. Allahyari, S. Pouriyeh, M. Assefi, “Text summarization techniques: a brief survey,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.

[4]

N. Chatterjee and S. Mohan, “Extraction-based single-document summarization using random indexing,” in 19th IEEE International Conference on Tools with Artificial Intelligence, 2007. ICTAI 2007, vol. 2, 2007.

Digital Library

[5]

R. McDonald, “A study of global inference algorithms in multi-document summarization,”Advances in Information Retrieval, pp. 557–564, 2007

[6]

AlSanie W, Touir A, Mathkour H. Towards an infrastructure for Arabic text summarization using rhetorical structure theory Master's thesis. Riyadh: King Saud University; 2005

[7]

R. Elbarougy, G. Behery and A. El Khatib, Extractive Arabic Text Summarization Using Modified PageRank Algorithm, Egyptian Informatics Journal, https://doi.org/10.1016/j.eij.2019.11.001

[8]

Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M. O. A. O, Shoul M. Alkhalil morpho sys1: A morphosyntactic analysis system for arabic texts. In International Arab conference on information technology. Benghazi Libya, 2010; p. 1–6.

[9]

Khoja, S., 1999. "Stemming Arabic Text". Lancaster, U.K., Computing Department, Lancaster University. www.comp.lancs.uk/computing/users/khoj aJstemmer.ps.

[10]

Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.

[11]

Alajmi, Amal & Saad, Elsayed & Darwish, R.R. (2012). Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications. 46. 8-13.

[12]

R. Řehůřek, “models.word2vec – Deep learning with word2vec,” 2017. [Online]. Available: https://radimrehurek.com/gensim/models/word2vec.html. [Accessed: 1-March-2022].

[13]

M.M. Fouad, A. Mahany, N. Aljohani, R.A. Abbasi, and S.-U. Hassan, “ArWordVec: Efficient word embedding models for arabic tweets,” Soft Computing, vol. 24, no. 11, pp. 8061–8068, 2020, https://github.com/mmdoha200/ArWordVec.

Digital Library

[14]

Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006) The Arabic WordNet Project, Proceedings of LREC 2006

[15]

Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso (2013) On the evaluation and improvement of Arabic WordNet coverage and usability, Language Resources and Evaluation 47(3) pp 891–917

[16]

Cosine Similarity. (2020, June 25). DeepAI. Retrieved September 20, 2022, from https://deepai.org/machine-learning-glossary-and-terms/cosine-similarity

[17]

EASC, “Arabic natural language resources,” 2016. [Online]. Available: https://sourceforge.net/projects/easc-corpus/. [Accessed: 1-Jan-2023].

[18]

El-Haj M, Kruschwitz U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. In: Language Resources and Evaluation conference (LREC), May 17–23. p. 36–9.

[19]

Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. J Artificial Intell Res 2004;22:457–79.

[20]

Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. p. 404–11.

[21]

Alami N, El Adlouni Y, En-nahnahi N, Meknassi M. Using statistical and semantic analysis for Arabic text summarization. In: International Conference on Information Technology and Communication Systems. Cham: Springer; 2017. p. 35–50.

[22]

Al-Taani AT, Al-Omour M. An extractive graph-based arabic text summarization approach. The International Arab Conference on Information Technology, 2014.

Cited By

Liu JLiang ZMu CZhang LAtkins A(2025)Knowledge graph-based reasoning in medical healthcare scenarios for IoT applicationsSensor Networks for Smart Hospitals10.1016/B978-0-443-36370-2.00026-8(535-550)Online publication date: 2025
https://doi.org/10.1016/B978-0-443-36370-2.00026-8
Albokae NAlKhtib BOmar K(2023)Hybrid Method for ICD Prediction Using Word Embedding and Natural Language Processing2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453813(1-5)Online publication date: 6-Dec-2023
https://doi.org/10.1109/ACIT58888.2023.10453813

Index Terms

Method for Arabic text Summarization using statistical features and word2vector approach
1. Computing methodologies

Recommendations

Leveraging Transformer Summarizer to Extract Sentences for Arabic Text Summarization
Abstract
Automatic Text Summarization (ATS) is one of the fastest-growing areas of Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP). Automatic text summarizing aims to create summaries by extracting relevant ...
Automatic arabic text summarization (AATS): A survey

Due to the obvious significant expansion in the number of online Arabic textual information, Arabic Text Summarization has become a focus of intense research. Manual text summarization necessitates a large investment of time, effort, and money. Hence, ...
Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization

Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts. Our system uses a clustering algorithm and an adapted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCTA '23: Proceedings of the 2023 9th International Conference on Computer Technology Applications

May 2023

270 pages

ISBN:9781450399579

DOI:10.1145/3605423

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCTA 2023

ICCTA 2023: 2023 9th International Conference on Computer Technology Applications

May 10 - 12, 2023

Vienna, Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
23
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu JLiang ZMu CZhang LAtkins A(2025)Knowledge graph-based reasoning in medical healthcare scenarios for IoT applicationsSensor Networks for Smart Hospitals10.1016/B978-0-443-36370-2.00026-8(535-550)Online publication date: 2025
https://doi.org/10.1016/B978-0-443-36370-2.00026-8
Albokae NAlKhtib BOmar K(2023)Hybrid Method for ICD Prediction Using Word Embedding and Natural Language Processing2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453813(1-5)Online publication date: 6-Dec-2023
https://doi.org/10.1109/ACIT58888.2023.10453813

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten