skip to main content
10.1145/3605423.3605444acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article

Method for Arabic text Summarization using statistical features and word2vector approach

Published:20 August 2023Publication History

ABSTRACT

Arabic text summarization is a field of natural language processing, and many algorithms for Arabic texts summarization have been developed, but these algorithms face weakness points which generated from the specifications of the Arabic language manipulation.

In this research we have developed new algorithm for Arabic text summarization that the our algorithm contains two main models of summarization, the statistical model and the semantic model ,that the statistical model was employed to generate the candidate sentences for summary, and the semantic model was used to select and generate the final summary , in the statistical model we have used sentences statistical features, and in semantic model we have used word2vector technology which convert words to number and keeping the context of text, the developed algorithm has been tested on The Essex Arabic Summaries Corpus (EASC) dataset, and the tests approved the efficiency and the accuracy of the developed algorithm, that the developed algorithm summarization Precision reached to 75%.

References

  1. Shen, D. (2009). Text Summarization. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_424Google ScholarGoogle ScholarCross RefCross Ref
  2. Rupal Bhargava, Yashvardhan Sharma, Gargi Sharma, ATSSI: Abstractive Text Summarization Using Sentiment Infusion, Procedia Computer Science,Volume 89,2016,Pages 404-411,ISSN 1877-0509, https://doi.org/10.1016/j.procs.2016.06.088.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Allahyari, S. Pouriyeh, M. Assefi , “Text summarization techniques: a brief survey,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  4. N. Chatterjee and S. Mohan, “Extraction-based single-document summarization using random indexing,” in 19th IEEE International Conference on Tools with Artificial Intelligence, 2007. ICTAI 2007, vol. 2, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. McDonald, “A study of global inference algorithms in multi-document summarization,”Advances in Information Retrieval, pp. 557–564, 2007Google ScholarGoogle ScholarCross RefCross Ref
  6. AlSanie W, Touir A, Mathkour H. Towards an infrastructure for Arabic text summarization using rhetorical structure theory Master's thesis. Riyadh: King Saud University; 2005Google ScholarGoogle Scholar
  7. R. Elbarougy, G. Behery and A. El Khatib, Extractive Arabic Text Summarization Using Modified PageRank Algorithm, Egyptian Informatics Journal, https://doi.org/10.1016/j.eij.2019.11.001Google ScholarGoogle ScholarCross RefCross Ref
  8. Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M. O. A. O, Shoul M. Alkhalil morpho sys1: A morphosyntactic analysis system for arabic texts. In International Arab conference on information technology. Benghazi Libya, 2010; p. 1–6.Google ScholarGoogle Scholar
  9. Khoja, S., 1999. "Stemming Arabic Text". Lancaster, U.K., Computing Department, Lancaster University. www.comp.lancs.uk/computing/users/khoj aJstemmer.ps.Google ScholarGoogle Scholar
  10. Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.Google ScholarGoogle Scholar
  11. Alajmi, Amal & Saad, Elsayed & Darwish, R.R.. (2012). Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications. 46. 8-13.Google ScholarGoogle Scholar
  12. R. Řehůřek, “models.word2vec – Deep learning with word2vec,” 2017. [Online]. Available: https://radimrehurek.com/gensim/models/word2vec.html. [Accessed: 1-March-2022].Google ScholarGoogle Scholar
  13. M.M. Fouad, A. Mahany, N. Aljohani, R.A. Abbasi, and S.-U. Hassan, “ArWordVec: Efficient word embedding models for arabic tweets,” Soft Computing, vol. 24, no. 11, pp. 8061–8068, 2020, https://github.com/mmdoha200/ArWordVec.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006) The Arabic WordNet Project, Proceedings of LREC 2006Google ScholarGoogle Scholar
  15. Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso (2013) On the evaluation and improvement of Arabic WordNet coverage and usability, Language Resources and Evaluation 47(3) pp 891–917Google ScholarGoogle Scholar
  16. Cosine Similarity. (2020, June 25). DeepAI. Retrieved September 20, 2022, from https://deepai.org/machine-learning-glossary-and-terms/cosine-similarityGoogle ScholarGoogle Scholar
  17. EASC, “Arabic natural language resources,” 2016. [Online]. Available: https://sourceforge.net/projects/easc-corpus/. [Accessed: 1-Jan-2023].Google ScholarGoogle Scholar
  18. El-Haj M, Kruschwitz U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. In: Language Resources and Evaluation conference (LREC), May 17–23. p. 36–9.Google ScholarGoogle Scholar
  19. Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. J Artificial Intell Res 2004;22:457–79.Google ScholarGoogle Scholar
  20. Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. p. 404–11.Google ScholarGoogle Scholar
  21. Alami N, El Adlouni Y, En-nahnahi N, Meknassi M. Using statistical and semantic analysis for Arabic text summarization. In: International Conference on Information Technology and Communication Systems. Cham: Springer; 2017. p. 35–50.Google ScholarGoogle Scholar
  22. Al-Taani AT, Al-Omour M. An extractive graph-based arabic text summarization approach. The International Arab Conference on Information Technology, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Method for Arabic text Summarization using statistical features and word2vector approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCTA '23: Proceedings of the 2023 9th International Conference on Computer Technology Applications
      May 2023
      270 pages
      ISBN:9781450399579
      DOI:10.1145/3605423

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format