Sentiment Extraction from Tweets: Multilingual Challenges

Makrynioti, Nantia; Vassalos, Vasilis

doi:10.1007/978-3-319-22729-0_11

Nantia Makrynioti¹⁵ &
Vasilis Vassalos¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

1775 Accesses
1 Citations

Abstract

Every day users of social networks and microblogging services share their point of view about products, companies, movies and their emotions on a variety of topics. As social networks and microblogging services become more popular, the need to mine and analyze their content grows. We study the task of sentiment analysis in the well-known social network Twitter (https://twitter.com/). We present a case study on tweets written in Greek and propose an effective method that categorizes Greek tweets as positive, negative and neutral according to their sentiment. We validate our method’s effectiveness on both Greek and English to check its robustness on multilingual challenges, and present the first multilingual comparative study with three pre-existing state of the art techniques for Twitter sentiment extraction on English tweets. Last but not least, we examine the importance of different preprocessing techniques in different languages. Our technique outperforms two out of the three methods we compared against and is on a par to the best of those methods, but it needs significantly less time for prediction and training.

This research has been co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) – Research Funding Program: Thales. Investing in knowledge society through the European Social Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Data are available by emailing the authors.
2.
https://dev.twitter.com/.
3.
List of positive emoticons: :-), :), :o), :], :3, :c), :\(>\), =], 8), =), :}, :⌃), \(<\)3, ⌃_⌃, ;\(>\), (:, ;), (;, :d, :D.
4.
List of negative emoticons: \(>\):[, :-(, :(, :-c, :c, :-\(<\), :\(<\), :-[, :[, :{, :’(, :/ .

References

Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26(3), 12:1–12:34 (2008)
Article Google Scholar
Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and sentiment analysis of modern standard arabic. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers . HLT 2011, vol. 2, pp. 587–591. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)
Google Scholar
Annett, M., Kondrak, G.: A comparison of sentiment analysis techniques: polarizing movie blogs. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 25–35. Springer, Heidelberg (2008)
Chapter Google Scholar
Atteveldt, W.V., Ruigrok, N., Schlobach, S.: Good news or bad news? conducting sentiment analysis on dutch text to distinguish between positive and negative relations. J. Inf. Technol. Polit. 5(1), 73–94 (2008)
Article Google Scholar
Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44. Association for Computational Linguistics (2010)
Google Scholar
Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)
Article Google Scholar
Fleiss, J., et al.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971)
Article Google Scholar
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. HLT 2011, vol. 2, pp. 42–47 (2011)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Processing 150(12), 1–6 (2009)
Google Scholar
Gonçalves, P., Araújo, M., Benevenuto, F., Cha, M.: Comparing and combining sentiment analysis methods. In: Proceedings of the First ACM Conference on Online Social Networks. pp. 27–38. COSN ’13, ACM, New York, NY, USA (2013)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hu, X., Tang, J., Gao, H., Liu, H.: Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd International Conference on World Wide Web. WWW 2013 (2013)
Google Scholar
Koleli, E.: A new Greek part-of-speech tagger, based on a maximum entropy classifier. Master’s thesis, Athens University of Economics and Business (2011)
Google Scholar
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 1995 Proceedings of Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)
Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mech. Translation Comput. Linguist. 11, 22–31 (1968)
Google Scholar
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Article Google Scholar
Mohammad, S., Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in sentiment analysis of tweets. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 321–327 (2013)
Google Scholar
Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in twitter. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 312–320 (2013)
Google Scholar
Ntais, G.: Development of a Stemmer for the greek Language. Master’s thesis, Stockholm’s University (2006)
Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA) (2010)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Google Scholar
Petasis, G., Spiliotopoulos, D., Tsirakis, N., Tsantilas, P.: Sentiment analysis for reputation management: mining the Greek web. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 327–340. Springer, Heidelberg (2014)
Chapter Google Scholar
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. EMNLP 2011 (2011)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT 2005, pp. 347–354 (2005)
Google Scholar
Zhao, J., Dong, L., Wu, J., Xu, K.: Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 1528–1531. KDD 2012 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Athens University of Economics and Business, 76 Patission Street, GR10434, Athens, Greece
Nantia Makrynioti & Vasilis Vassalos

Authors

Nantia Makrynioti
View author publications
You can also search for this author in PubMed Google Scholar
Vasilis Vassalos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nantia Makrynioti .

Editor information

Editors and Affiliations

University of Science and Technology, Rolla, Missouri, USA
Sanjay Madria
Osaka University, Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Makrynioti, N., Vassalos, V. (2015). Sentiment Extraction from Tweets: Multilingual Challenges. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-22729-0_11
Published: 05 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics