Abstract
Besides alternative text-based forms, emojis became highly common in social media. Given their importance in daily communication, we tackled the problem of emoji prediction in Portuguese social media text. We created a dataset with occurrences of frequent emojis, used as labels, and then compared the performance of traditional machine learning algorithms with neural networks when predicting them. Either considering five or ten of the most popular emojis, an LSTM neural network clearly outperformed Naive Bayes in the latter task, with F1-scores of 60% and 52%, respectively, against 33% and 23%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Real-time usage of emojis in Twitter is available in http://emojitracker.com.
- 2.
Available in http://www.tweepy.org.
- 3.
See https://scikit-learn.org/stable/tutorial/text_analytics/working_with_tex_data.html for using scikit-learn with textual data.
- 4.
We used NLTK’s Portuguese stopword list, https://www.nltk.org.
- 5.
References
Barbieri, F., Ballesteros, M., Saggion, H.: Are emojis predictable? In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, pp. 105–111. ACL, April 2017
Barbieri, F., et al.: SemEval 2018 task 2: multilingual emoji prediction. In: Proceedings of the 12th International Workshop on Semantic Evaluation, pp. 24–33 (2018)
Barbieri, F., Kruszewski, G., Ronzano, F., Saggion, H.: How cosmopolitan are emojis?: exploring emojis usage and meaning over different languages with distributional semantics. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 531–535. ACM (2016)
Chen, X., Vorvoreanu, M., Madhavan, K.: Mining social media data for understanding students’ learning experiences. IEEE Trans. Learn. Technol. 7(3), 246–259 (2014)
Cunha, J.M., Martins, P., Machado, P.: Emojinating: representing concepts using emoji. In: Proceedings of the ICCBR 2018 Workshop on Knowledge-Based Systems in Computational Design and Media (KBS-CDM), Stockholm, Sweden (2018)
Duarte, L., Macedo, L., Gonçalo Oliveira, H.: Exploring emojis for emotion recognition in portuguese text. In: Moura Oliveira, P., Novais, P., Reis, L.P. (eds.) EPIA 2019. LNCS (LNAI), vol. 11805, pp. 719–730. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30244-3_59
Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M., Riedel, S.: emoji2vec: learning emoji representations from their description. In: Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, Austin, TX, USA, pp. 48–54. ACL Press, November 2016
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126
Guibon, G., Ochs, M., Bellot, P.: Emoji recommendation in private instant messages. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 1821–1823. ACM (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Novak, P.K., Smailović, J., Sluban, B., Mozetič, I.: Sentiment of emojis. PLoS ONE 10(12), e0144296 (2015)
Pavalanathan, U., Eisenstein, J.: Emoticons vs. emojis on Twitter: a causal inference approach. arXiv preprint arXiv:1510.08480 (2015)
Rodrigues, D., Prada, M., Gaspar, R., Garrido, M.V., Lopes, D.: Lisbon emoji and emoticon database (LEED): norms for emoji and emoticons in seven evaluative dimensions. Behav. Res. Methods 50(1), 392–405 (2018)
Shiha, M., Ayvaz, S.: The effects of emoji in sentiment analysis. Int. J. Comput. Electr. Eng. (IJCEE.) 9(1), 360–369 (2017)
Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_11
Van Nes, F., Abma, T., Jonsson, H., Deeg, D.: Language differences in qualitative research: is meaning lost in translation? Eur. J. Ageing 7(4), 313–316 (2010)
Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 606–615. Association for Computational Linguistics, November 2016
Wood, I.D., Ruder, S.: Emoji as emotion tags for tweets. In: Proceedings of the Emotion and Sentiment Analysis Workshop LREC2016, Portorož, Slovenia, pp. 76–79 (2016)
Xie, R., Liu, Z., Yan, R., Sun, M.: Neural emoji recommendation in dialogue systems. CoRR abs/1612.04609 (2016). http://arxiv.org/abs/1612.04609
Zhao, P., Jia, J., An, Y., Liang, J., Xie, L., Luo, J.: Analyzing and predicting emoji usages in social media. In: Companion Proceedings of the the Web Conference 2018, pp. 327–334. International World Wide Web Conferences Steering Committee (2018)
Acknowledgements
This work was developed in the scope of the SOCIALITE Project (PTDC/EEISCR/2072/2014), co-financed by COMPETE 2020, Portugal 2020 – Operational Program for Competitiveness and Internationalization (POCI), European Union’s ERDF (European Regional Development Fund), and the Portuguese Foundation for Science and Technology (FCT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, L., Macedo, L., Gonçalo Oliveira, H. (2020). Emoji Prediction for Portuguese. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-41505-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)