Skip to main content

Multi-language Sentiment Analysis – Lesson Learnt from NLP Case Study

  • Conference paper
  • First Online:
Information Systems (EMCIS 2021)

Abstract

The aim of this paper is to present the use of sentiment analysis in both Polish and English languages. This goal is related to the fact that the authors of this article have observed many sentences in both Polish and English, used in social media and on websites in Poland. The paper presents the principles of various inflectional forms that should be used in the preparation of the training dataset being the subject of the analysis. Therefore, one of the goals of this article is to identify possible problems that an analyst of sentiment analysis machine learning methods may misinterpret. The motivation of the study was to see if the same methods could be used to analyse sentiment in different languages. We decided to evaluate the possibility of using one sentiment evaluation mechanism, assuming the use of similarly prepared training sets. In addition, the article shows the principles and differences between these languages, including in terms of the possibility of gender identification based on the text. We presented the results of a case study that showed how machine learning tools treat unstructured data to find the right sentiment and what problems can be identified when delivering text in these two languages. The conducted study also showed the possibility of using Big Data sources, such as comments in the form of comments on websites or social media, in order to correctly identify the sentiment, which is not always the case if the training set is not prepared properly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Franks, B. (ed.): Web data: the original big data. In: Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. Wiley (2012)

    Google Scholar 

  2. Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Article  Google Scholar 

  3. van Grinsven, V.T., Snijkers, G.: Sentiments and perceptions of business respondents on social media: an exploratory analysis. J. Off. Stat. 31(2), 283–304 (2015)

    Article  Google Scholar 

  4. Ordenes, V.F., Ludwig, S., De Ruyter, K., Grewal, D., Wetzels, M.: Unveiling what is written in the stars: analyzing explicit, implicit, and discourse patterns of sentiment in social media. J. Consum. Res. 43(6), 875–894 (2017)

    Google Scholar 

  5. Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on Twitter. J. Advert. 46(2), 236–247 (2017)

    Article  Google Scholar 

  6. Müller, O., Debortoli, S., Junglas, I., vom Brocke, J.: Using text analytics to derive customer service management benefits from unstructured data. MIS Q. Exec. 15(4), 243–258 (2016)

    Google Scholar 

  7. Liang, P.: Learning executable semantic parsers for natural language understanding. Commun. ACM 59(9), 68–76 (2016)

    Article  Google Scholar 

  8. Osman, C., Zălhan, P.: From natural language text to visual models: a survey of issues and approaches. Inform. Econ. 20(4), 44–61 (2016)

    Google Scholar 

  9. Maślankowski, J.: Towards de-duplication framework in big data analysis. a case study. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2016. LNBIP, vol. 264, pp. 104–113. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46642-2_7

    Chapter  Google Scholar 

  10. Ishihara, S.: Score-based likelihood ratios for linguistic text evidence with a bag-of-words model. Forensic Sci. Int. 327, 110980 (2021). https://doi.org/10.1016/j.forsciint.2021.110980

    Article  Google Scholar 

  11. Ameer, R.S.A., Al-Taei, M.: Human action recognition based on bag-of-words. Iraqi J. Sci. 61(5), 1202–1214 (2020). https://doi.org/10.24996/ijs.2020.61.5.27

    Article  Google Scholar 

  12. Yan, D., Li, K., Gu, S., Yang, L.: Network-based bag-of-words model for text classification. IEEE Access 8, 82641–82652 (2020). https://doi.org/10.1109/ACCESS.2020.2991074

    Article  Google Scholar 

  13. Maślankowski, J.: Data quality issues concerning statistical data gathering supported by big data technology. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 92–101. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06932-6_10

    Chapter  Google Scholar 

  14. Bańczerowski, J.: Some contrastive considerations about semantics in the communication process. In: Fisiak, J. (ed.) Papers and Studies in Contrastive Linguistics. The Polish-English Contrastive Project, vol. 2, pp. 11–32. Adam Mickiewicz University Poznań; Center for Applied Linguistics, Washington D.C. (1974)

    Google Scholar 

  15. Piasecki, M.: Selektywne wprowadzenie do semantyki formalnej. In: Szymanik, J., Zajenkowski, M. (eds.) Kognitywistyka. O umyśle umyślnie i nieumyślnie, pp.114–117, Koło Filozoficzne przy MISH, Uniwersytet Warszawski (2004)

    Google Scholar 

  16. Larson, R., Segal, G.: Knowledge of Meaning. MIT Press, Cambridge (1995)

    Book  Google Scholar 

  17. Fernández, E.M., Smith Cairns, H.: Fundamentals of Psycholinguistics, p.1–10. Wiley-Blackwell (2011)

    Google Scholar 

  18. Briscoe, T.: Introduction to formal semantics for natural language (2011). http://www.cl.cam.ac.uk/teaching/1011/L107/semantics.pdf

  19. Kebande, R.V., Karani, N.Y.: Formal semantics, syntax, pragmatics: an essence of programming language design. Acad. Res. Int. 4(2), 124–131 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacek Maślankowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maślankowski, J., Majewicz, D. (2022). Multi-language Sentiment Analysis – Lesson Learnt from NLP Case Study. In: Themistocleous, M., Papadaki, M. (eds) Information Systems. EMCIS 2021. Lecture Notes in Business Information Processing, vol 437. Springer, Cham. https://doi.org/10.1007/978-3-030-95947-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95947-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95946-3

  • Online ISBN: 978-3-030-95947-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics