Abstract
The aim of this paper is to present the use of sentiment analysis in both Polish and English languages. This goal is related to the fact that the authors of this article have observed many sentences in both Polish and English, used in social media and on websites in Poland. The paper presents the principles of various inflectional forms that should be used in the preparation of the training dataset being the subject of the analysis. Therefore, one of the goals of this article is to identify possible problems that an analyst of sentiment analysis machine learning methods may misinterpret. The motivation of the study was to see if the same methods could be used to analyse sentiment in different languages. We decided to evaluate the possibility of using one sentiment evaluation mechanism, assuming the use of similarly prepared training sets. In addition, the article shows the principles and differences between these languages, including in terms of the possibility of gender identification based on the text. We presented the results of a case study that showed how machine learning tools treat unstructured data to find the right sentiment and what problems can be identified when delivering text in these two languages. The conducted study also showed the possibility of using Big Data sources, such as comments in the form of comments on websites or social media, in order to correctly identify the sentiment, which is not always the case if the training set is not prepared properly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Franks, B. (ed.): Web data: the original big data. In: Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. Wiley (2012)
Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
van Grinsven, V.T., Snijkers, G.: Sentiments and perceptions of business respondents on social media: an exploratory analysis. J. Off. Stat. 31(2), 283–304 (2015)
Ordenes, V.F., Ludwig, S., De Ruyter, K., Grewal, D., Wetzels, M.: Unveiling what is written in the stars: analyzing explicit, implicit, and discourse patterns of sentiment in social media. J. Consum. Res. 43(6), 875–894 (2017)
Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46(2), 236–247 (2017)
Müller, O., Debortoli, S., Junglas, I., vom Brocke, J.: Using text analytics to derive customer service management benefits from unstructured data. MIS Q. Exec. 15(4), 243–258 (2016)
Liang, P.: Learning executable semantic parsers for natural language understanding. Commun. ACM 59(9), 68–76 (2016)
Osman, C., Zălhan, P.: From natural language text to visual models: a survey of issues and approaches. Inform. Econ. 20(4), 44–61 (2016)
Maślankowski, J.: Towards de-duplication framework in big data analysis. a case study. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2016. LNBIP, vol. 264, pp. 104–113. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46642-2_7
Ishihara, S.: Score-based likelihood ratios for linguistic text evidence with a bag-of-words model. Forensic Sci. Int. 327, 110980 (2021). https://doi.org/10.1016/j.forsciint.2021.110980
Ameer, R.S.A., Al-Taei, M.: Human action recognition based on bag-of-words. Iraqi J. Sci. 61(5), 1202–1214 (2020). https://doi.org/10.24996/ijs.2020.61.5.27
Yan, D., Li, K., Gu, S., Yang, L.: Network-based bag-of-words model for text classification. IEEE Access 8, 82641–82652 (2020). https://doi.org/10.1109/ACCESS.2020.2991074
Maślankowski, J.: Data quality issues concerning statistical data gathering supported by big data technology. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 92–101. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06932-6_10
Bańczerowski, J.: Some contrastive considerations about semantics in the communication process. In: Fisiak, J. (ed.) Papers and Studies in Contrastive Linguistics. The Polish-English Contrastive Project, vol. 2, pp. 11–32. Adam Mickiewicz University Poznań; Center for Applied Linguistics, Washington D.C. (1974)
Piasecki, M.: Selektywne wprowadzenie do semantyki formalnej. In: Szymanik, J., Zajenkowski, M. (eds.) Kognitywistyka. O umyśle umyślnie i nieumyślnie, pp.114–117, Koło Filozoficzne przy MISH, Uniwersytet Warszawski (2004)
Larson, R., Segal, G.: Knowledge of Meaning. MIT Press, Cambridge (1995)
Fernández, E.M., Smith Cairns, H.: Fundamentals of Psycholinguistics, p.1–10. Wiley-Blackwell (2011)
Briscoe, T.: Introduction to formal semantics for natural language (2011). http://www.cl.cam.ac.uk/teaching/1011/L107/semantics.pdf
Kebande, R.V., Karani, N.Y.: Formal semantics, syntax, pragmatics: an essence of programming language design. Acad. Res. Int. 4(2), 124–131 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Maślankowski, J., Majewicz, D. (2022). Multi-language Sentiment Analysis – Lesson Learnt from NLP Case Study. In: Themistocleous, M., Papadaki, M. (eds) Information Systems. EMCIS 2021. Lecture Notes in Business Information Processing, vol 437. Springer, Cham. https://doi.org/10.1007/978-3-030-95947-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-95947-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95946-3
Online ISBN: 978-3-030-95947-0
eBook Packages: Computer ScienceComputer Science (R0)