Multi-language Sentiment Analysis – Lesson Learnt from NLP Case Study

Maślankowski, Jacek; Majewicz, Dorota

doi:10.1007/978-3-030-95947-0_4

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 437))

Included in the following conference series:

European, Mediterranean, and Middle Eastern Conference on Information Systems

Abstract

The aim of this paper is to present the use of sentiment analysis in both Polish and English languages. This goal is related to the fact that the authors of this article have observed many sentences in both Polish and English, used in social media and on websites in Poland. The paper presents the principles of various inflectional forms that should be used in the preparation of the training dataset being the subject of the analysis. Therefore, one of the goals of this article is to identify possible problems that an analyst of sentiment analysis machine learning methods may misinterpret. The motivation of the study was to see if the same methods could be used to analyse sentiment in different languages. We decided to evaluate the possibility of using one sentiment evaluation mechanism, assuming the use of similarly prepared training sets. In addition, the article shows the principles and differences between these languages, including in terms of the possibility of gender identification based on the text. We presented the results of a case study that showed how machine learning tools treat unstructured data to find the right sentiment and what problems can be identified when delivering text in these two languages. The conducted study also showed the possibility of using Big Data sources, such as comments in the form of comments on websites or social media, in order to correctly identify the sentiment, which is not always the case if the training set is not prepared properly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Franks, B. (ed.): Web data: the original big data. In: Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. Wiley (2012)
Google Scholar
Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Article Google Scholar
van Grinsven, V.T., Snijkers, G.: Sentiments and perceptions of business respondents on social media: an exploratory analysis. J. Off. Stat. 31(2), 283–304 (2015)
Article Google Scholar
Ordenes, V.F., Ludwig, S., De Ruyter, K., Grewal, D., Wetzels, M.: Unveiling what is written in the stars: analyzing explicit, implicit, and discourse patterns of sentiment in social media. J. Consum. Res. 43(6), 875–894 (2017)
Google Scholar
Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46(2), 236–247 (2017)
Article Google Scholar
Müller, O., Debortoli, S., Junglas, I., vom Brocke, J.: Using text analytics to derive customer service management benefits from unstructured data. MIS Q. Exec. 15(4), 243–258 (2016)
Google Scholar
Liang, P.: Learning executable semantic parsers for natural language understanding. Commun. ACM 59(9), 68–76 (2016)
Article Google Scholar
Osman, C., Zălhan, P.: From natural language text to visual models: a survey of issues and approaches. Inform. Econ. 20(4), 44–61 (2016)
Google Scholar
Maślankowski, J.: Towards de-duplication framework in big data analysis. a case study. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2016. LNBIP, vol. 264, pp. 104–113. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46642-2_7
Chapter Google Scholar
Ishihara, S.: Score-based likelihood ratios for linguistic text evidence with a bag-of-words model. Forensic Sci. Int. 327, 110980 (2021). https://doi.org/10.1016/j.forsciint.2021.110980
Article Google Scholar
Ameer, R.S.A., Al-Taei, M.: Human action recognition based on bag-of-words. Iraqi J. Sci. 61(5), 1202–1214 (2020). https://doi.org/10.24996/ijs.2020.61.5.27
Article Google Scholar
Yan, D., Li, K., Gu, S., Yang, L.: Network-based bag-of-words model for text classification. IEEE Access 8, 82641–82652 (2020). https://doi.org/10.1109/ACCESS.2020.2991074
Article Google Scholar
Maślankowski, J.: Data quality issues concerning statistical data gathering supported by big data technology. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 92–101. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06932-6_10
Chapter Google Scholar
Bańczerowski, J.: Some contrastive considerations about semantics in the communication process. In: Fisiak, J. (ed.) Papers and Studies in Contrastive Linguistics. The Polish-English Contrastive Project, vol. 2, pp. 11–32. Adam Mickiewicz University Poznań; Center for Applied Linguistics, Washington D.C. (1974)
Google Scholar
Piasecki, M.: Selektywne wprowadzenie do semantyki formalnej. In: Szymanik, J., Zajenkowski, M. (eds.) Kognitywistyka. O umyśle umyślnie i nieumyślnie, pp.114–117, Koło Filozoficzne przy MISH, Uniwersytet Warszawski (2004)
Google Scholar
Larson, R., Segal, G.: Knowledge of Meaning. MIT Press, Cambridge (1995)
Book Google Scholar
Fernández, E.M., Smith Cairns, H.: Fundamentals of Psycholinguistics, p.1–10. Wiley-Blackwell (2011)
Google Scholar
Briscoe, T.: Introduction to formal semantics for natural language (2011). http://www.cl.cam.ac.uk/teaching/1011/L107/semantics.pdf
Kebande, R.V., Karani, N.Y.: Formal semantics, syntax, pragmatics: an essence of programming language design. Acad. Res. Int. 4(2), 124–131 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Gdańsk, ul. Armii Krajowej 101, 81-824, Sopot, Poland
Jacek Maślankowski
The Academy of Tourism and Hotel Management in Gdansk, ul. Miszewskiego 12/13, 80-239, Gdańsk, Poland
Dorota Majewicz

Authors

Jacek Maślankowski
View author publications
You can also search for this author in PubMed Google Scholar
Dorota Majewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacek Maślankowski .

Editor information

Editors and Affiliations

University of Nicosia, Nicosia, Cyprus
Marinos Themistocleous
British University in Dubai, Dubai, United Arab Emirates
Maria Papadaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maślankowski, J., Majewicz, D. (2022). Multi-language Sentiment Analysis – Lesson Learnt from NLP Case Study. In: Themistocleous, M., Papadaki, M. (eds) Information Systems. EMCIS 2021. Lecture Notes in Business Information Processing, vol 437. Springer, Cham. https://doi.org/10.1007/978-3-030-95947-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-95947-0_4
Published: 16 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95946-3
Online ISBN: 978-3-030-95947-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics