Abstract
The quality of word representation is crucial to obtain good results in many natural language processing tasks. Recently, many word representation models (word embeddings), such as fastText, have been developed. In this research, we compared the algorithms for the fastText implementation, Facebook’s official implementation, and Gensim’s implementation using the same pre-trained fastText model. Using multi-class classification, we evaluated these embeddings. According to the results, the Facebook implementation performed better than Gensim’s implementation, with an average accuracy of 78.22% and 56.73%, respectively, for sentence embeddings and an average accuracy of 79.43% and 57.95%, respectively, for word embeddings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Moh’d Mesleh, A.: Support vector machines based arabic language text classification system: feature selection comparative study. In: Advances in Computer and Information Sciences and Engineering, pp. 11–16. Springer (2008)
Zahran, M.A., Magooda, A., Mahgoub, A.Y., Raafat, H., Rashwan, M., Atyia, A.: Word representations in vector space and their applications for Arabic. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 430–443. Springer (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Yansyah, B.A.: gensim – FastText model (2018). https://radimrehurek.com/gensim/models/fasttext.html
Yansyah, B.A.: FastText 0.8.3 (2017). https://pypi.org/project/fasttext/
Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)
Altowayan, A.A., Tao, L.: Word embeddings for Arabic sentiment analysis. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3820–3825. IEEE (2016)
Dahou, A., Xiong, S., Zhou, J., Haddoud, M.H., Duan, P.: Word embeddings and convolutional neural network for Arabic sentiment classification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2418–2427 (2016)
Ashi, M.M., Siddiqui, M.A., Nadeem, F.: Pre-trained word embeddings for Arabic aspect-based sentiment analysis of airline tweets. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 241–251. Springer (2018)
Majlis, M.: Wikipedia-API 0.3.7 (2018). https://pypi.org/project/Wikipedia-API/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alghamdi, N., Assiri, F. (2020). A Comparison of fastText Implementations Using Arabic Text Classification. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1038. Springer, Cham. https://doi.org/10.1007/978-3-030-29513-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-29513-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29512-7
Online ISBN: 978-3-030-29513-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)