SnTiEmd: Sentiment Specific Embedding Model Generation and Evaluation for a Resource Constraint Language

Afroze, Sadia; Hoque, Mohammed Moshiul

doi:10.1007/978-3-031-19958-5_23

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 569))

Included in the following conference series:

International Conference on Intelligent Computing & Optimization

772 Accesses
4 Citations

Abstract

Sentiment analysis is primitive natural language processing (NLP) research for resource-constrained languages where feature extraction in a specific domain is a challenging issue. The word embeddings are good intermediate text to feature extraction methods where capturing semantic regularities between words. However, the performance of general word embeddings is limited in capturing domain-specific semantics or knowledge in sentiment analysis. Moreover, for low-resource languages like Bengali, no sentiment-specific word embedding research has been conducted to date. This study developed a domain-based, e.g., sentiment-specific embedding corpus (SeC) and built an intrinsic evaluation dataset. The three embedding methods (i.e., GloVe, FastText, Word2Vec) are investigated to develop the sentiment-based embedding model (SnTiEmd). The SnTiEmd (i.e., GloVe, fastText, Word2Vec) models are evaluated using an intrinsic evaluation dataset (i.e., semantic and syntactic). The highest accuracy of Pearson correlation for syntactic similarity is (\(55.66\%\)) and semantic similarity (\(52.97\%\)), whereas the maximum accuracy for spearman correlation is (\(52.28\%\)) and (\(55.19\%\)) for syntactic and semantic word similarity using GloVe-based SnTiEmd, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 186–190. IEEE (2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Tran. ACL 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)
Google Scholar
Hong, T.V.T., Do, P.: Comparing two models of document similarity search over a text stream of articles from online news sites. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 379–388. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_38
Chapter Google Scholar
Hossain, M.R., Hoque, M.M.: Towards Bengali word embedding: corpus creation, intrinsic and extrinsic evaluations. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 453–459. NLP Association of India (NLPAI), Indian Institute of Technology Patna, Patna (2020)
Google Scholar
Hossain, M.R., Hoque, M.M., Dewan, M.A.A., Siddique, N., Islam, N., Sarker, I.H.: Authorship classification in a resource constraint language using convolutional neural networks. IEEE Access 9, 100319–100338 (2021). https://doi.org/10.1109/ACCESS.2021.3095967
Article Google Scholar
Hossain, M.R., Hoque, M.M., Sarker, I.H.: Text classification using convolution neural networks with FastText embedding. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, T.-P. (eds.) HIS 2020. AISC, vol. 1375, pp. 103–113. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73050-5_11
Chapter Google Scholar
Hossain, M.R., Hoque, M.M., Siddique, N., Sarker, I.H.: Bengali text document categorization based on very deep convolution neural network. Expert Syst. Appl. 184, 115394 (2021)
Google Scholar
Li, Y., Yang, T.: Word embedding for understanding natural language: a survey. In: Srinivasan, S. (ed.) Guide to Big Data Applications. SBD, vol. 26, pp. 83–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-53817-4_4
Chapter Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pawar, A., Mago, V.: Calculating the similarity between words and sentences using a lexical database and corpus statistics. arXiv preprint arXiv:1802.05667 (2018)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Sarzynska-Wawer, J., et al.: Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135 (2021)
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 136–145 (2015)
Google Scholar
Zhelezniak, V., Savkov, A., Shen, A., Hammerla, N.: Correlation coefficients and semantic textual similarity. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 951–962. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1100

Download references

Acknowledgement

This work was supported by ICT Division, Ministry of Posts, Telecommunications & Information Technology, Bangladesh.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong, 4349, Bangladesh
Sadia Afroze & Mohammed Moshiul Hoque
Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, 1207, Bangladesh
Sadia Afroze

Authors

Sadia Afroze
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Moshiul Hoque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Modeling Evolutionary Algorithms Simulation and Artificial Intelligence, Faculty of Electrical and Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Pandian Vasant
Faculty of Engineering Management, Poznan University of Technology, Poznań, Wielkopolskie, Poland
Gerhard-Wilhelm Weber
Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
José Antonio Marmolejo-Saucedo
Department of Business Statistics and Operations Research, School of Economic Sciences, North West University, Mahikeng, South Africa
Elias Munapo
Department of Computer Science, UOW Malaysia KDU Penang University Colle, George Town, Malaysia
J. Joshua Thomas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afroze, S., Hoque, M.M. (2023). SnTiEmd: Sentiment Specific Embedding Model Generation and Evaluation for a Resource Constraint Language. In: Vasant, P., Weber, GW., Marmolejo-Saucedo, J.A., Munapo, E., Thomas, J.J. (eds) Intelligent Computing & Optimization. ICO 2022. Lecture Notes in Networks and Systems, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-031-19958-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-19958-5_23
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19957-8
Online ISBN: 978-3-031-19958-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

SnTiEmd: Sentiment Specific Embedding Model Generation and Evaluation for a Resource Constraint Language