skip to main content
10.1145/3605423.3605444acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article

Method for Arabic text Summarization using statistical features and word2vector approach

Published: 20 August 2023 Publication History

Abstract

Arabic text summarization is a field of natural language processing, and many algorithms for Arabic texts summarization have been developed, but these algorithms face weakness points which generated from the specifications of the Arabic language manipulation.
In this research we have developed new algorithm for Arabic text summarization that the our algorithm contains two main models of summarization, the statistical model and the semantic model,that the statistical model was employed to generate the candidate sentences for summary, and the semantic model was used to select and generate the final summary, in the statistical model we have used sentences statistical features, and in semantic model we have used word2vector technology which convert words to number and keeping the context of text, the developed algorithm has been tested on The Essex Arabic Summaries Corpus (EASC) dataset, and the tests approved the efficiency and the accuracy of the developed algorithm, that the developed algorithm summarization Precision reached to 75%.

References

[1]
Shen, D. (2009). Text Summarization. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_424
[2]
Rupal Bhargava, Yashvardhan Sharma, Gargi Sharma, ATSSI: Abstractive Text Summarization Using Sentiment Infusion, Procedia Computer Science,Volume 89,2016,Pages 404-411,ISSN 1877-0509, https://doi.org/10.1016/j.procs.2016.06.088.
[3]
M. Allahyari, S. Pouriyeh, M. Assefi, “Text summarization techniques: a brief survey,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.
[4]
N. Chatterjee and S. Mohan, “Extraction-based single-document summarization using random indexing,” in 19th IEEE International Conference on Tools with Artificial Intelligence, 2007. ICTAI 2007, vol. 2, 2007.
[5]
R. McDonald, “A study of global inference algorithms in multi-document summarization,”Advances in Information Retrieval, pp. 557–564, 2007
[6]
AlSanie W, Touir A, Mathkour H. Towards an infrastructure for Arabic text summarization using rhetorical structure theory Master's thesis. Riyadh: King Saud University; 2005
[7]
R. Elbarougy, G. Behery and A. El Khatib, Extractive Arabic Text Summarization Using Modified PageRank Algorithm, Egyptian Informatics Journal, https://doi.org/10.1016/j.eij.2019.11.001
[8]
Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M. O. A. O, Shoul M. Alkhalil morpho sys1: A morphosyntactic analysis system for arabic texts. In International Arab conference on information technology. Benghazi Libya, 2010; p. 1–6.
[9]
Khoja, S., 1999. "Stemming Arabic Text". Lancaster, U.K., Computing Department, Lancaster University. www.comp.lancs.uk/computing/users/khoj aJstemmer.ps.
[10]
Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.
[11]
Alajmi, Amal & Saad, Elsayed & Darwish, R.R. (2012). Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications. 46. 8-13.
[12]
R. Řehůřek, “models.word2vec – Deep learning with word2vec,” 2017. [Online]. Available: https://radimrehurek.com/gensim/models/word2vec.html. [Accessed: 1-March-2022].
[13]
M.M. Fouad, A. Mahany, N. Aljohani, R.A. Abbasi, and S.-U. Hassan, “ArWordVec: Efficient word embedding models for arabic tweets,” Soft Computing, vol. 24, no. 11, pp. 8061–8068, 2020, https://github.com/mmdoha200/ArWordVec.
[14]
Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006) The Arabic WordNet Project, Proceedings of LREC 2006
[15]
Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso (2013) On the evaluation and improvement of Arabic WordNet coverage and usability, Language Resources and Evaluation 47(3) pp 891–917
[16]
Cosine Similarity. (2020, June 25). DeepAI. Retrieved September 20, 2022, from https://deepai.org/machine-learning-glossary-and-terms/cosine-similarity
[17]
EASC, “Arabic natural language resources,” 2016. [Online]. Available: https://sourceforge.net/projects/easc-corpus/. [Accessed: 1-Jan-2023].
[18]
El-Haj M, Kruschwitz U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. In: Language Resources and Evaluation conference (LREC), May 17–23. p. 36–9.
[19]
Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. J Artificial Intell Res 2004;22:457–79.
[20]
Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. p. 404–11.
[21]
Alami N, El Adlouni Y, En-nahnahi N, Meknassi M. Using statistical and semantic analysis for Arabic text summarization. In: International Conference on Information Technology and Communication Systems. Cham: Springer; 2017. p. 35–50.
[22]
Al-Taani AT, Al-Omour M. An extractive graph-based arabic text summarization approach. The International Arab Conference on Information Technology, 2014.

Cited By

View all
  • (2025)Knowledge graph-based reasoning in medical healthcare scenarios for IoT applicationsSensor Networks for Smart Hospitals10.1016/B978-0-443-36370-2.00026-8(535-550)Online publication date: 2025
  • (2023)Hybrid Method for ICD Prediction Using Word Embedding and Natural Language Processing2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453813(1-5)Online publication date: 6-Dec-2023

Index Terms

  1. Method for Arabic text Summarization using statistical features and word2vector approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCTA '23: Proceedings of the 2023 9th International Conference on Computer Technology Applications
    May 2023
    270 pages
    ISBN:9781450399579
    DOI:10.1145/3605423
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Arabic WordNet
    2. Arabic text summarization
    3. Arabic word2vector

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCTA 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Knowledge graph-based reasoning in medical healthcare scenarios for IoT applicationsSensor Networks for Smart Hospitals10.1016/B978-0-443-36370-2.00026-8(535-550)Online publication date: 2025
    • (2023)Hybrid Method for ICD Prediction Using Word Embedding and Natural Language Processing2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453813(1-5)Online publication date: 6-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media