Skip to main content

An Apache Spark Implementation for Sentiment Analysis on Twitter Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10230))

Abstract

Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work, we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies alongside with Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis. The classification algorithms are used for both binary and ternary classification, and we examine the effect of the dataset size as well as the features of the input on the quality of results. Finally, the proposed system was trained and validated with real data crawled by Twitter and in following results are compared with the ones from real users.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://spark.apache.org/.

  2. 2.

    http://spark.apache.org/mllib/.

  3. 3.

    http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/.

  4. 4.

    https://www.crowdflower.com/data-for-everyone/.

  5. 5.

    http://sentipoll.herokuapp.com/.

References

  1. Agarwal, A., et al.: Sentiment analysis of Twitter data. In: Workshop on Languages in Social Media (2011)

    Google Scholar 

  2. Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2008)

    Article  Google Scholar 

  3. Bollen, J., Mao, H., Pepe, A.: Twitter sentiment and socio-economic phenomena. In: International Conference on Web and Social Media (ICWSM) (2011)

    Google Scholar 

  4. Chikersal, P., Poria, S., Cambria, E.: Sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: International Workshop on Semantic Evaluation (SemEval), pp. 647–651 (2015)

    Google Scholar 

  5. Chinthala, S., et al.: Sentiment analysis on twitter streaming data. in: Emerging ICT for Bridging the Future-Proceedings of the Annual Convention of the Computer Society of India (CSI), vol. 1 (2015)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, vol. 12 (2009)

    Google Scholar 

  8. Hodeghatta, U.R.: Sentiment analysis of hollywood movies on Twitter. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1401–1404 (2013)

    Google Scholar 

  9. Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., Tsakalidis, A.: Modeling ReTweet diffusion using emotional content. In: Artificial Intelligence Applications and Innovations (AIAI), pp. 101–110 (2014)

    Google Scholar 

  10. Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., Tsakalidis, A.: Conversation emotional modeling in social networks. In: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 478–484 (2014)

    Google Scholar 

  11. Kanavos, A., Perikos, I.: Towards detecting emotional communities in Twitter. In: IEEE International Conference on Research Challenges in Information Science (RCIS), pp. 524–525 (2015)

    Google Scholar 

  12. Kanavos, A., Perikos, I., Hatzilygeroudis, I., Tsakalidis, A.: Integrating user’s emotional behavior for community detection in social networks. In: International Conference on Web Information Systems and Technologies (WEBIST) (2016)

    Google Scholar 

  13. Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: International Conference on Computational Linguistics, p. 1367 (2004)

    Google Scholar 

  14. Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on Twitter. Web Science, Article No. 8 (2011)

    Google Scholar 

  15. Nodarakis, N., Sioutas, S., Tsakalidis, A., Tzimas, G.: Large scale sentiment analysis on Twitter with Spark. In: EDBT/ICDT Workshops (2016)

    Google Scholar 

  16. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10 (2010)

    Google Scholar 

  17. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  18. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: ACL Conference on Empirical methods in Natural Language Processing, pp. 79–86 (2002)

    Google Scholar 

  19. Poonam, W.: Twitter sentiment analysis with emoticons. Int. J. Eng. Comput. Sci. 4(4), 11315–11321 (2015)

    Google Scholar 

  20. Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: CICLing, pp. 121–136 (2013)

    Google Scholar 

  21. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 252–259 (2003)

    Google Scholar 

  22. Turney, P.D.: Semantic orientation applied to unsupervised classification of reviews. In: Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)

    Google Scholar 

  23. Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In: ACL System Demonstrations, pp. 115–120 (2012)

    Google Scholar 

  24. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005)

    Google Scholar 

  25. Yamamoto, Y., Kumamoto, T., Nadamoto, A.: Role of emoticons for multidimensional sentiment analysis of Twitter. In: International Conference on Information Integration and Web-based Applications Services (iiWAS), pp. 107–115 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Kanavos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Baltas, A., Kanavos, A., Tsakalidis, A.K. (2017). An Apache Spark Implementation for Sentiment Analysis on Twitter Data. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57045-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57044-0

  • Online ISBN: 978-3-319-57045-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics