ABSTRACT
Arabic text summarization is a field of natural language processing, and many algorithms for Arabic texts summarization have been developed, but these algorithms face weakness points which generated from the specifications of the Arabic language manipulation.
In this research we have developed new algorithm for Arabic text summarization that the our algorithm contains two main models of summarization, the statistical model and the semantic model ,that the statistical model was employed to generate the candidate sentences for summary, and the semantic model was used to select and generate the final summary , in the statistical model we have used sentences statistical features, and in semantic model we have used word2vector technology which convert words to number and keeping the context of text, the developed algorithm has been tested on The Essex Arabic Summaries Corpus (EASC) dataset, and the tests approved the efficiency and the accuracy of the developed algorithm, that the developed algorithm summarization Precision reached to 75%.
- Shen, D. (2009). Text Summarization. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_424Google ScholarCross Ref
- Rupal Bhargava, Yashvardhan Sharma, Gargi Sharma, ATSSI: Abstractive Text Summarization Using Sentiment Infusion, Procedia Computer Science,Volume 89,2016,Pages 404-411,ISSN 1877-0509, https://doi.org/10.1016/j.procs.2016.06.088.Google ScholarCross Ref
- M. Allahyari, S. Pouriyeh, M. Assefi , “Text summarization techniques: a brief survey,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.Google ScholarCross Ref
- N. Chatterjee and S. Mohan, “Extraction-based single-document summarization using random indexing,” in 19th IEEE International Conference on Tools with Artificial Intelligence, 2007. ICTAI 2007, vol. 2, 2007.Google ScholarDigital Library
- R. McDonald, “A study of global inference algorithms in multi-document summarization,”Advances in Information Retrieval, pp. 557–564, 2007Google ScholarCross Ref
- AlSanie W, Touir A, Mathkour H. Towards an infrastructure for Arabic text summarization using rhetorical structure theory Master's thesis. Riyadh: King Saud University; 2005Google Scholar
- R. Elbarougy, G. Behery and A. El Khatib, Extractive Arabic Text Summarization Using Modified PageRank Algorithm, Egyptian Informatics Journal, https://doi.org/10.1016/j.eij.2019.11.001Google ScholarCross Ref
- Boudlal A, Lakhouaja A, Mazroui A, Meziane A, Bebah M. O. A. O, Shoul M. Alkhalil morpho sys1: A morphosyntactic analysis system for arabic texts. In International Arab conference on information technology. Benghazi Libya, 2010; p. 1–6.Google Scholar
- Khoja, S., 1999. "Stemming Arabic Text". Lancaster, U.K., Computing Department, Lancaster University. www.comp.lancs.uk/computing/users/khoj aJstemmer.ps.Google Scholar
- Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.Google Scholar
- Alajmi, Amal & Saad, Elsayed & Darwish, R.R.. (2012). Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications. 46. 8-13.Google Scholar
- R. Řehůřek, “models.word2vec – Deep learning with word2vec,” 2017. [Online]. Available: https://radimrehurek.com/gensim/models/word2vec.html. [Accessed: 1-March-2022].Google Scholar
- M.M. Fouad, A. Mahany, N. Aljohani, R.A. Abbasi, and S.-U. Hassan, “ArWordVec: Efficient word embedding models for arabic tweets,” Soft Computing, vol. 24, no. 11, pp. 8061–8068, 2020, https://github.com/mmdoha200/ArWordVec.Google ScholarDigital Library
- Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006) The Arabic WordNet Project, Proceedings of LREC 2006Google Scholar
- Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso (2013) On the evaluation and improvement of Arabic WordNet coverage and usability, Language Resources and Evaluation 47(3) pp 891–917Google Scholar
- Cosine Similarity. (2020, June 25). DeepAI. Retrieved September 20, 2022, from https://deepai.org/machine-learning-glossary-and-terms/cosine-similarityGoogle Scholar
- EASC, “Arabic natural language resources,” 2016. [Online]. Available: https://sourceforge.net/projects/easc-corpus/. [Accessed: 1-Jan-2023].Google Scholar
- El-Haj M, Kruschwitz U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. In: Language Resources and Evaluation conference (LREC), May 17–23. p. 36–9.Google Scholar
- Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. J Artificial Intell Res 2004;22:457–79.Google Scholar
- Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. p. 404–11.Google Scholar
- Alami N, El Adlouni Y, En-nahnahi N, Meknassi M. Using statistical and semantic analysis for Arabic text summarization. In: International Conference on Information Technology and Communication Systems. Cham: Springer; 2017. p. 35–50.Google Scholar
- Al-Taani AT, Al-Omour M. An extractive graph-based arabic text summarization approach. The International Arab Conference on Information Technology, 2014.Google Scholar
Index Terms
- Method for Arabic text Summarization using statistical features and word2vector approach
Recommendations
Automatic arabic text summarization (AATS): A survey
Due to the obvious significant expansion in the number of online Arabic textual information, Arabic Text Summarization has become a focus of intense research. Manual text summarization necessitates a large investment of time, effort, and money. Hence, ...
Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization
Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts. Our system uses a clustering algorithm and an adapted ...
Comments