Abstract
Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work, we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies alongside with Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis. The classification algorithms are used for both binary and ternary classification, and we examine the effect of the dataset size as well as the features of the input on the quality of results. Finally, the proposed system was trained and validated with real data crawled by Twitter and in following results are compared with the ones from real users.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, A., et al.: Sentiment analysis of Twitter data. In: Workshop on Languages in Social Media (2011)
Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2008)
Bollen, J., Mao, H., Pepe, A.: Twitter sentiment and socio-economic phenomena. In: International Conference on Web and Social Media (ICWSM) (2011)
Chikersal, P., Poria, S., Cambria, E.: Sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: International Workshop on Semantic Evaluation (SemEval), pp. 647–651 (2015)
Chinthala, S., et al.: Sentiment analysis on twitter streaming data. in: Emerging ICT for Bridging the Future-Proceedings of the Annual Convention of the Computer Society of India (CSI), vol. 1 (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, vol. 12 (2009)
Hodeghatta, U.R.: Sentiment analysis of hollywood movies on Twitter. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1401–1404 (2013)
Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., Tsakalidis, A.: Modeling ReTweet diffusion using emotional content. In: Artificial Intelligence Applications and Innovations (AIAI), pp. 101–110 (2014)
Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., Tsakalidis, A.: Conversation emotional modeling in social networks. In: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 478–484 (2014)
Kanavos, A., Perikos, I.: Towards detecting emotional communities in Twitter. In: IEEE International Conference on Research Challenges in Information Science (RCIS), pp. 524–525 (2015)
Kanavos, A., Perikos, I., Hatzilygeroudis, I., Tsakalidis, A.: Integrating user’s emotional behavior for community detection in social networks. In: International Conference on Web Information Systems and Technologies (WEBIST) (2016)
Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: International Conference on Computational Linguistics, p. 1367 (2004)
Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on Twitter. Web Science, Article No. 8 (2011)
Nodarakis, N., Sioutas, S., Tsakalidis, A., Tzimas, G.: Large scale sentiment analysis on Twitter with Spark. In: EDBT/ICDT Workshops (2016)
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10 (2010)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: ACL Conference on Empirical methods in Natural Language Processing, pp. 79–86 (2002)
Poonam, W.: Twitter sentiment analysis with emoticons. Int. J. Eng. Comput. Sci. 4(4), 11315–11321 (2015)
Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: CICLing, pp. 121–136 (2013)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 252–259 (2003)
Turney, P.D.: Semantic orientation applied to unsupervised classification of reviews. In: Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)
Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In: ACL System Demonstrations, pp. 115–120 (2012)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005)
Yamamoto, Y., Kumamoto, T., Nadamoto, A.: Role of emoticons for multidimensional sentiment analysis of Twitter. In: International Conference on Information Integration and Web-based Applications Services (iiWAS), pp. 107–115 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Baltas, A., Kanavos, A., Tsakalidis, A.K. (2017). An Apache Spark Implementation for Sentiment Analysis on Twitter Data. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-57045-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57044-0
Online ISBN: 978-3-319-57045-7
eBook Packages: Computer ScienceComputer Science (R0)