ArWordVec: efficient word embedding models for Arabic tweets

Fouad, Mohammed M.; Mahany, Ahmed; Aljohani, Naif; Abbasi, Rabeeh Ayaz; Hassan, Saeed-Ul

doi:10.1007/s00500-019-04153-6

ArWordVec: efficient word embedding models for Arabic tweets

Focus
Published: 26 June 2019

Volume 24, pages 8061–8068, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Mohammed M. Fouad ORCID: orcid.org/0000-0002-6369-6178¹,
Ahmed Mahany²,
Naif Aljohani³,
Rabeeh Ayaz Abbasi⁴ &
…
Saeed-Ul Hassan⁵

918 Accesses
21 Citations
6 Altmetric
Explore all metrics

Abstract

One of the major advances in artificial intelligence nowadays is to understand, process and utilize the humans’ natural language. This has been achieved by employing the different natural language processing (NLP) techniques along with the aid of the various deep learning approaches and architectures. Using the distributed word representations to substitute the traditional bag-of-words approach has been utilized very efficiently in the last years for many NLP tasks. In this paper, we present the detailed steps of building a set of efficient word embedding models called ArWordVec that are generated from a huge repository of Arabic tweets. In addition, a new method for measuring Arabic word similarity is introduced that has been used in evaluating the performance of the generated ArWordVec models. The experimental results show that the performance of the ArWordVec models overcomes the recently available models on Arabic Twitter data for the word similarity task. In addition, two of the large Arabic tweets datasets are used to examine the performance of the proposed models in the multi-class sentiment analysis task. The results show that the proposed models are very efficient and help in achieving a classification accuracy ratio exceeding 73.86% with a high average F1 value of 74.15.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

References

Al-Azani S, El-Alfy ESM (2017) Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Procedia Comput. Sci. 109:359–366
Article Google Scholar
Al-Twairesh N, Al-Khalifa H, Al-Salman A (2016) AraSenTi: large-scale twitter-specific Arabic sentiment lexicons. In: The 54th annual meeting of the association for computational linguistics (ACL)
Ananiadou S, Thompson P, Nawaz R (2013) Enhancing search: events and their discourse context. In: International conference on intelligent text processing and computational linguistics. Springer, Berlin, Heidelberg, pp 318–334
Almarwani N, Diab M (2017) Arabic textual entailment with word embeddings. In: The 3rd Arabic natural language processing workshop (WANLP), pp 185–190
Batista-Navarro RT, Kontonatsios G, Mihăilă C, Thompson P, Rak R, Nawaz R, Korkontzelos I, Ananiadou S (2013) Facilitating the analysis of discourse phenomena in an interoperable NLP platform. In: International conference on intelligent text processing and computational linguistics. Springer, Berlin, Heidelberg, pp 559–571
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
El-Mawass N, Alaboodi S (2016) Detecting Arabic spammers and content polluters on Twitter. In: 2016 6th international conference on digital information processing and communications, ICDIPC 2016
Fahmi A, Abdullah S, Amin F, Ali A (2017) Precursor selection for sol–gel synthesis of titanium carbide nanopowders by a new cubic fuzzy multi-attribute group decision-making model. J Intell Syst 5:4. https://doi.org/10.1515/jisys-2017-0083
Article Google Scholar
Fahmi A, Abdullah S, Amin F, Ali MS (2018a) Trapezoidal cubic fuzzy number Einstein hybrid weighted averaging operators and its application to decision making. Soft Comput. https://doi.org/10.1007/s00500-018-3242-6
Article MATH Google Scholar
Fahmi A, Amin F, Abdullah S, Ali A (2018b) Cubic fuzzy Einstein aggregation operators and its application to decision making. Int J Syst Sci. https://doi.org/10.1080/00207721.2018.1503356
Article MathSciNet MATH Google Scholar
Fernandez RC, Mansour E, Qahtan A, Elmagarmid A, Ilyas I, Maden S, Ouzzani M, Stonebraker M, Tand N (2018) Seeping semantics: linking datasets using word embeddings for data discovery. In: 34th IEEE international conference on data engineering
Glove-python (2016). https://github.com/maciejkula/glove-python
Howells K, Ertugana A (2017) Applying fuzzy logic for sentiment analysis of social media network data in marketing. In: 9th international conference on theory and application of soft computing, computing with words and perception, ICSCCW 2017
Indhuja K, Reghu Raj P C (2014) Fuzzy logic based sentiment analysis of product review documents. In: 2014 1st international conference on computational systems and communications (ICCSC)
Kumar D, Shaalan Y, Zhang X, Chan J (2018) Identifying singleton spammers via spammer group detection. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Jahangir M, Afzal H, Ahmed M, Khurshid K, Nawaz R (2017) An expert system for diabetes prediction using auto tuned multi-layer perceptron. In: 2017 Intelligent systems conference (IntelliSys). IEEE, pp 722–728
Lu C, Huang H, Jian P, Wang D, Guo Y-D (2017) A P-LSTM neural network for sentiment classification. In: Kim J, Shim K, Cao L, Lee J-G, Lin X, Moon Y-S (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 524–533
Chapter Google Scholar
Luong M-T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: The SIGNLL conference on computational natural language learning (CoNLL-2013)
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space, pp 1–12. https://doi.org/10.1162/153244303322533223
Mikolov T, Le QV, Sutskever I (2013b) Exploiting similarities among languages for machine translation. https://doi.org/10.1162/153244303322533223
Mohammad SM, Salameh M, Kiritchenko S (2016) How translation alters sentiment. J Artif Intell Res 55:95–130. https://doi.org/10.1613/jair.4787
Article MathSciNet Google Scholar
Nabil M, Aly M, Atiya A (2015) ASTD: Arabic sentiment tweets dataset. In: Proceedings of 2015 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D15-1299
Nakov P, Ritter A, Rosenthal S, Stoyanov V, Sebastiani F (2016) SemEval-2016 Task 4: sentiment analysis in twitter. In: Proceedings of the 10th international workshop on semantic evaluations (SemEval-2017), pp 1–18
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. https://doi.org/10.13140/2.1.2393.1847
Salameh M, Mohammad SM, Kiritchenko S, Canada C (2015) Sentiment after translation: a case-study on Arabic social media posts. In: Human language technologies: the 2015 annual conference of the North American chapter of the ACL, pp 767–777
Shardlow M, Batista-Navarro R, Thompson P, Nawaz R, McNaught J, Ananiadou S (2018) Identification of research hypotheses and new knowledge from scientific literature. BMC Med Inform Decis Mak 18(1):46
Article Google Scholar
Soliman AB, Eissa K, El-Beltagy SR (2017) AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput Sci 117:256–265. https://doi.org/10.1016/j.procs.2017.10.117
Article Google Scholar
Wang M, Chen S, He L (2018) Sentiment classification using neural networks with sentiment centroids. In: Phung D, Tseng VS, Webb GI, Ho B, Ganji M, Rashidi L (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 56–67
Chapter Google Scholar
Xun G, Li Y, Gao J, Zhang A (2017) Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’17
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv Prepr arXiv:1510.03820. https://doi.org/10.3115/v1/D14-1181
Ziani A, Azizi N, Schwab D, Aldwairi M, Chekkai N, Zenakhra D, Cheriguene S (2017) Recommender system through sentiment analysis. In: The 2nd international conference on automatic control, telecommunications and signals

Download references

Acknowledgements

We express our thanks to the administration of the High Performance Computing Center (HPCC) at King Abdulaziz University, Jeddah, Saudi Arabia, for their support and the access to the Aziz Supercomputer that helped us in performing our experiments which require both huge computing capabilities and storage space.

Author information

Authors and Affiliations

Fujitsu Technology Solutions, Dubai, United Arab Emirates
Mohammed M. Fouad
Faculty of Computers and Information Sciences, Ain Shams University, Cairo, Egypt
Ahmed Mahany
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Naif Aljohani
Department of Computer Science, Quaid-i-Azam University, Islamabad, Pakistan
Rabeeh Ayaz Abbasi
Information Technology University, Lahore, Pakistan
Saeed-Ul Hassan

Authors

Mohammed M. Fouad
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Mahany
View author publications
You can also search for this author in PubMed Google Scholar
Naif Aljohani
View author publications
You can also search for this author in PubMed Google Scholar
Rabeeh Ayaz Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Saeed-Ul Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed M. Fouad.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Mu-Yen Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fouad, M.M., Mahany, A., Aljohani, N. et al. ArWordVec: efficient word embedding models for Arabic tweets. Soft Comput 24, 8061–8068 (2020). https://doi.org/10.1007/s00500-019-04153-6

Download citation

Published: 26 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00500-019-04153-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ArWordVec: efficient word embedding models for Arabic tweets

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ArWordVec: efficient word embedding models for Arabic tweets

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation