Abstract
Sentiment analysis is primitive natural language processing (NLP) research for resource-constrained languages where feature extraction in a specific domain is a challenging issue. The word embeddings are good intermediate text to feature extraction methods where capturing semantic regularities between words. However, the performance of general word embeddings is limited in capturing domain-specific semantics or knowledge in sentiment analysis. Moreover, for low-resource languages like Bengali, no sentiment-specific word embedding research has been conducted to date. This study developed a domain-based, e.g., sentiment-specific embedding corpus (SeC) and built an intrinsic evaluation dataset. The three embedding methods (i.e., GloVe, FastText, Word2Vec) are investigated to develop the sentiment-based embedding model (SnTiEmd). The SnTiEmd (i.e., GloVe, fastText, Word2Vec) models are evaluated using an intrinsic evaluation dataset (i.e., semantic and syntactic). The highest accuracy of Pearson correlation for syntactic similarity is (\(55.66\%\)) and semantic similarity (\(52.97\%\)), whereas the maximum accuracy for spearman correlation is (\(52.28\%\)) and (\(55.19\%\)) for syntactic and semantic word similarity using GloVe-based SnTiEmd, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 186–190. IEEE (2017)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Tran. ACL 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)
Hong, T.V.T., Do, P.: Comparing two models of document similarity search over a text stream of articles from online news sites. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 379–388. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_38
Hossain, M.R., Hoque, M.M.: Towards Bengali word embedding: corpus creation, intrinsic and extrinsic evaluations. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 453–459. NLP Association of India (NLPAI), Indian Institute of Technology Patna, Patna (2020)
Hossain, M.R., Hoque, M.M., Dewan, M.A.A., Siddique, N., Islam, N., Sarker, I.H.: Authorship classification in a resource constraint language using convolutional neural networks. IEEE Access 9, 100319–100338 (2021). https://doi.org/10.1109/ACCESS.2021.3095967
Hossain, M.R., Hoque, M.M., Sarker, I.H.: Text classification using convolution neural networks with FastText embedding. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 103–113. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73050-5_11
Hossain, M.R., Hoque, M.M., Siddique, N., Sarker, I.H.: Bengali text document categorization based on very deep convolution neural network. Expert Syst. Appl. 184, 115394 (2021)
Li, Y., Yang, T.: Word embedding for understanding natural language: a survey. In: Srinivasan, S. (ed.) Guide to Big Data Applications. SBD, vol. 26, pp. 83–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-53817-4_4
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pawar, A., Mago, V.: Calculating the similarity between words and sentences using a lexical database and corpus statistics. arXiv preprint arXiv:1802.05667 (2018)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Sarzynska-Wawer, J., et al.: Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135 (2021)
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 136–145 (2015)
Zhelezniak, V., Savkov, A., Shen, A., Hammerla, N.: Correlation coefficients and semantic textual similarity. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 951–962. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1100
Acknowledgement
This work was supported by ICT Division, Ministry of Posts, Telecommunications & Information Technology, Bangladesh.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Afroze, S., Hoque, M.M. (2023). SnTiEmd: Sentiment Specific Embedding Model Generation and Evaluation for a Resource Constraint Language. In: Vasant, P., Weber, GW., Marmolejo-Saucedo, J.A., Munapo, E., Thomas, J.J. (eds) Intelligent Computing & Optimization. ICO 2022. Lecture Notes in Networks and Systems, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-031-19958-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-19958-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19957-8
Online ISBN: 978-3-031-19958-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)