Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning

Sazzed, Salim

doi:10.1007/978-3-030-80599-9_20

Salim Sazzed¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12801))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1666 Accesses
3 Citations

Abstract

One of the barriers of sentiment analysis research in low-resource languages such as Bengali is the lack of annotated data. Manual annotation requires resources, which are scarcely available in low-resource languages. We present a cross-lingual hybrid methodology that utilizes machine translation and prior sentiment information to generate accurate pseudo-labels. By leveraging the pseudo-labels, a supervised ML classifier is trained for sentiment classification. We contrast the performance of the proposed self-supervised methodology with the Bengali and English sentiment classification methods (i.e., methods which do not require labeled data). We observe that the self-supervised hybrid methodology improves the macro F1 scores by 15%–25%. The results infer that the proposed framework can improve the performance of sentiment classification in low-resource languages that lack labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abdi, A., Shamsuddin, S.M., Hasan, S., Piran, J.: Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf. Process. Manag. 56(4), 1245–1259 (2019)
Google Scholar
Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 186–190, February 2017. https://doi.org/10.1109/ECACE.2017.7912903
Balahur, A., Turchi, M.: Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput. Speech Lang. 28(1), 56–75 (2014). https://doi.org/10.1016/j.csl.2013.03.004
Balamurali, A., Joshi, A., Bhattacharyya, P.: Cross-lingual sentiment analysis for Indian languages using linked wordnets. In: COLING (2012)
Google Scholar
Banea, C., Mihalcea, R., Wiebe, J., Hassan, S.: Multilingual subjectivity analysis using machine translation. In: 2008 Conference on Empirical Methods in Natural Language Processing, pp. 127–135 (2008)
Google Scholar
Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6, 557–570 (2018)
Google Scholar
Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in Bangla microblog posts. In: 2014 International Conference on Informatics, Electronics Vision (ICIEV), pp. 1–6, May 2014
Google Scholar
Das, A., Bandyopadhyay, S.: Sentiwordnet for Bangla. Knowl. Sharing Event-4: Task 2, 1–8 (2010)
Google Scholar
Das, A., Bandyopadhyay, S.: Topic-based Bengali opinion summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 232–240. Association for Computational Linguistics (2010)
Google Scholar
Feng, Y., Wan, X.: Towards a unified end-to-end approach for fully unsupervised cross-lingual sentiment analysis. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 1035–1044. Hong Kong, China, November 2019
Google Scholar
Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on Bangla and romanized Bangla text using deep recurrent models. In: 2016 International Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE (2016)
Google Scholar
He, Y., Zhou, D.: Self-training from labeled features for sentiment analysis. Inf. Process. Manag. 47(4), 606–616 (2011)
Google Scholar
Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Google Scholar
Islam, M.S., Islam, M.A., Hossain, M.A., Dey, J.J.: Supervised approach of sentimentality extraction from Bengali Facebook status. In: 2016 19th International Conference on Computer and Information Technology (ICCIT), pp. 383–387, December 2016
Google Scholar
Lusa, L., et al.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)
Google Scholar
Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., Wang, H.: Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, pp. 572–581 (2012)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Google Scholar
Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 650–655. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_61
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Sazzed, S.: Cross-lingual sentiment classification in low-resource Bengali language. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 50–60 (2020)
Google Scholar
Sazzed, S.: Development of sentiment lexicon in Bengali utilizing corpus and cross-lingual resources. In: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pp. 237–244. IEEE (2020)
Google Scholar
Sazzed, S., Jayarathna, S.: A sentiment classification in Bengali and machine translated English corpus. In: 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), pp. 107–114 (2019)
Google Scholar
Sazzed, S., Jayarathna, S.: Ssentia: a self-supervised sentiment analyzer for classification from unlabeled data. Mach. Learn. Appl. 4 (2021)
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61(12), 2544–2558 (2010)
Google Scholar
Tripto, N., Eunus Ali, M.: Detecting multilabel sentiment and emotions from Bangla Youtube comments. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–6 (2018)
Google Scholar
Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)
Google Scholar
Xu, R., Yang, Y., Otani, N., Wu, Y.: Unsupervised cross-lingual transfer of word embedding spaces. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2465–2474. Brussels, Belgium, October-November 2018
Google Scholar
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89 (2011)
Google Scholar
Zhang, W., Zhao, K., Qiu, L., Hu, C.: Sess: a self-supervised and syntax-based method for sentiment classification. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 2, pp. 596–605 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Old Dominion University, Norfolk, VA, 23508, USA
Salim Sazzed

Authors

Salim Sazzed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salim Sazzed .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Derby, Derby, UK
Farid Meziane
German Research Center for Artificial Intelligence, Saarbrücken, Germany
Helmut Horacek
University of Hertfordshire, Hatfield, UK
Epaminondas Kapetanios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sazzed, S. (2021). Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-80599-9_20
Published: 20 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80598-2
Online ISBN: 978-3-030-80599-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics