skip to main content
10.1145/2245276.2245364acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Towards building large-scale distributed systems for twitter sentiment analysis

Published: 26 March 2012 Publication History

Abstract

In recent years, social networks have become very popular. Twitter, a micro-blogging service, is estimated to have about 200 million registered users and these users create approximately 65 million tweets a day. Twitter users usually show their opinion about topics of their interest. The challenge is that each tweet is limited in 140 characters, and is hence very short. It may contain slang and misspelled words. Thus, it is difficult to apply traditional NLP techniques which are designed for working with formal languages, into Twitter domain. Another challenge is that the total volume of tweets is extremely high, and it takes a long time to process. In this paper, we describe a large-scale distributed system for real-time Twitter sentiment analysis. Our system consists of two components: a lexicon builder and a sentiment classifier. These two components are capable of running on a large-scale distributed system since they are implemented using a MapReduce framework and a distributed database model. Thus, our lexicon builder and sentiment classifier are scalable with the number of machines and the size of data. The experiments also show that our lexicon has a good quality in opinion extraction, and the accuracy of the sentiment classifier can be improved by combining the lexicon with a machine learning technique.

References

[1]
Agarwal, A., Xie, B., Vovsha, I., Rambow, O. and Passonneau, R. 2011. Sentiment Analysis of Twitter Data. LSM 2011.
[2]
Apache Hadoop -- an open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/.
[3]
Apache HBase--a Hadoop database. http://hbase.apache.org/.
[4]
Apache Mahout -- a scalable machine learning library. http://mahout.apache.org/.
[5]
Aue, A., and Gamon, M. 2005. Customizing Sentiment Classifiers to New Domains: a Case Study. RANLP 2005.
[6]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A. and Gruber, R. E. 2006. Bigtable: a distributed storage system for structured data. OSDI 2006.
[7]
Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107--113.
[8]
Ding, X., Liu, B. and Yu, P. S. 2008. A holistic lexicon-based approach to opinion mining. WSDM 2008.
[9]
Elsayed, T., Lin, J. and Oard, D. W. 2008. Pairwise document similarity in large collections with MapReduce. HLT-Short 2008.
[10]
Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J. and Dredze, M. 2010. Annotating named entities in Twitter data with crowdsourcing. CSLDAMT 2010.
[11]
Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J. and Smith, N. A. 2011. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. ACL 2011.
[12]
Go, A., Bhayani, R. and Huang, L. 2009. Twitter sentiment classification using distant supervision. Technical report, Stanford, 2009.
[13]
Measuring tweets. http://blog.twitter.com/2010/02/measuring-tweets.html.
[14]
Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP 2002.
[15]
SentiWordNet -- a lexical resource for opinion mining. http://sentiwordnet.isti.cnr.it/.
[16]
Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M. 2011. Lexicon-based methods for sentiment analysis. Comput. Linguist, 2011, 37, 2 267--307.
[17]
Velikovich, L., Blair-Goldensohn, S., Hannan, K. and McDonald, R. 2010. The viability of web-derived polarity lexicons. HLT 2010.
[18]
Zhang, J., Jin, R., Yang, Y. and Hauptmann, A. 2003. Modified Logistic Regression: An Approximation to SVM and its Applications in Large-Scale Text Categorization. ICML 2003.
[19]
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M. and Liu, B. 2011. Combining Lexicon-based and Learning-based Methods for Twitter Sentiment Analysis. Technical report, HP Laboratories, 2011.

Cited By

View all
  • (2024)Short Paper: A Cloud-based Distributed Approach for Social Media Sentiment Analysis using Machine Learning with Distributed Hyperparameter TuningProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704548(240-246)Online publication date: 19-Dec-2024
  • (2023)Sentiment analysis and topic modeling of COVID-19 tweets of IndiaInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02082-015:5(1756-1776)Online publication date: 17-Aug-2023
  • (2022)Sentiment Analysis of Twitter DataApplied Sciences10.3390/app12221177512:22(11775)Online publication date: 19-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
  • Conference Chairs:
  • Sascha Ossowski,
  • Paola Lecca
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. opinion mining
  3. social networks
  4. twitter

Qualifiers

  • Research-article

Conference

SAC 2012
Sponsor:
SAC 2012: ACM Symposium on Applied Computing
March 26 - 30, 2012
Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Short Paper: A Cloud-based Distributed Approach for Social Media Sentiment Analysis using Machine Learning with Distributed Hyperparameter TuningProceedings of the 11th International Conference on Networking, Systems, and Security10.1145/3704522.3704548(240-246)Online publication date: 19-Dec-2024
  • (2023)Sentiment analysis and topic modeling of COVID-19 tweets of IndiaInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02082-015:5(1756-1776)Online publication date: 17-Aug-2023
  • (2022)Sentiment Analysis of Twitter DataApplied Sciences10.3390/app12221177512:22(11775)Online publication date: 19-Nov-2022
  • (2022)Monitoring the Emotional Response to the COVID-19 Pandemic Using Sentiment AnalysisComputational Intelligence and Neuroscience10.1155/2022/49146652022Online publication date: 1-Jan-2022
  • (2022)Using Big Data and Serverless Architecture to Follow the Emotional Response to the COVID-19 Pandemic in MexicoHigh Performance Computing10.1007/978-3-031-23821-5_11(145-159)Online publication date: 21-Dec-2022
  • (2021)Airlines based Twitter Sentiment Analysis Using Deep Learning2021 5th International Conference on Information Systems and Computer Networks (ISCON)10.1109/ISCON52037.2021.9702502(1-6)Online publication date: 22-Oct-2021
  • (2021)Over a decade of social opinion mining: a systematic reviewArtificial Intelligence Review10.1007/s10462-021-10030-254:7(4873-4965)Online publication date: 1-Oct-2021
  • (2021)Credibility Analysis in Social Big DataSocial Big Data Analytics10.1007/978-981-33-6652-7_3(61-88)Online publication date: 11-Mar-2021
  • (2020)Real-Time Tweet Analytics Using Hybrid Hashtags on Twitter Big Data StreamsInformation10.3390/info1107034111:7(341)Online publication date: 30-Jun-2020
  • (2020)Sentiment analysis of online product reviews using DLMNN and future prediction of online product using IANFISJournal of Big Data10.1186/s40537-020-00308-77:1Online publication date: 19-May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media