Skip to main content
Log in

Semantic Twitter sentiment analysis based on a fuzzy thesaurus

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial naïve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. STS-Gold dataset can be requested from the authors at: http://kmi.open.ac.uk/people/member/hassan-saif

  2. Stanford dataset official page: http://help.sentiment140.com/for-students

  3. Stanford testing and training datasets can be downloaded from: https://docs.google.com/file/d/0B04GJPshIjmPRnZManQwWEdTZjg/edit

References

  • Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: features selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):1–34

    Article  Google Scholar 

  • Agarwal A, Xie B, Vovsha I, Rambow O (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38

  • Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: 23rd International conference on computational linguistics. Association for Computational Linguistics, pp 36–44

  • Batra S, Rao D (2010) Entity based sentiment analysis on Twitter. Science 9(4):1–12

    Google Scholar 

  • Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques for sentiment analysis of Twitter data. In: International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591

  • Boulianne S (2015) Social media use and participation: a meta-analysis of current research. Inf Commun Soc 18(5):524–538

    Article  Google Scholar 

  • Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28:15–21

    Article  Google Scholar 

  • Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. AAAI fall symposium: commonsense knowledge 10

  • Elfeky M, Elhawary M (2010) Mining Arabic business reviews. In: International conference in data mining. IEEE, Sydney. pp 1108–1113

  • Esuli A (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation, pp 417–422 (2006)

  • Garcia I, Ng YK (2006) Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. In: Tools with artificial intelligence, ICTAI’06. IEEE, pp 465–473

  • Go A, Bhayani R, Huang L (2009). Twitter sentiment classification using distant supervision. Stanford digital library technologies projects

  • Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. Ldv Forum 20(1):19–62

    Google Scholar 

  • Ismail HM (2014) Using concept maps and fuzzy set information retrieval model to dynamically personalize RSS feeds. Int J Comput Sci Netw Secur 14(2):10

    Google Scholar 

  • Ismail HM, Harous S, Belkhouche B (2016) A comparative analysis of machine learning classifiers for Twitter sentiment analysis. Res Comput Sci 110:71–83

    Google Scholar 

  • Ismail HM, Zaki N, Belkhouche B (2016) Using custom fuzzy thesaurus to incorporate semantics and reduce data sparsity for Twitter sentiment analysis. In: 3rd International conference on soft computing and machine intelligence (ISCMI). IEEE, pp 47–52

  • Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Portland, pp 151–160

  • Kao A, Poteet SR (eds) (2007) Natural language processing and text mining. Springer, Berlin

    MATH  Google Scholar 

  • Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of Twitter posts. Expert Syst Appl 40(10):4065–4074

    Article  Google Scholar 

  • Kraft DH, Bordogna G, Pasi G (1999) Fuzzy set techniques in information retrieval. Fuzzy Sets Approx Reason Inf Syst 5(6):469–510

    Article  MathSciNet  MATH  Google Scholar 

  • Lee B, Pang L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Google Scholar 

  • Lima ACE, de Castro LN, Corchado JM (2015) A polarity analysis framework for Twitter messages. Applied Mathematics and Computation 270(1):756–767

    Article  Google Scholar 

  • Liu Y, Kliman-Silver C, Mislove A (2014) The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior. ICWSM 30:5–314

    Google Scholar 

  • LOL, OMG and ILY: 60 of The Dominating Abbreviations (2014) (Just English) Retrieved November 2015, from http://justenglish.me/2014/07/18/lol-omg-and-ily-60-of-the-dominating-abbreviations/

  • Manning CD, Raghavan P, Schütze H (2009) Text classification and naive bayes. In: Introduction to information retrieval. Cambridge University Press, pp 253–287

  • Ogawa Y, Morita T, Kobayashi K (1991) A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets Syst 39(2):163–179

    Article  MathSciNet  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up: sentiment classification using machine learning techniques. Association for Computational Linguistics, Stroudsburg

    Book  Google Scholar 

  • Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company Tweets. In: 2nd International workshop on search and mining user-generated contents. ACM, New York, pp 95–102

  • Pew Research Center. (2014, November). Cell Phones, Social Media, and Campaign 2014. (Pew Research Center) Retrieved January 2016, from http://www.pewinternet.org/2014/11/03/cell-phones-social-media-and-campaign-2014

  • Porter MF (1980) An Algorithm for Suffix Stripping. Program 14(3):130–137

    Article  Google Scholar 

  • Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis a survey and a new dataset, the STS-gold. In: Interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013). Italy

  • Saif H, He Y, Alani H (2012) Alleviating data sparsity for twitter sentiment analysis. Making sense of microposts. CEUR-WS. org, Lyon, France

  • Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of twitter. Inf Process Manag 52(1):5–19

    Article  Google Scholar 

  • Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437

    Article  Google Scholar 

  • Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing. UK, pp 53–63

  • Strapparava C, Valitutti A (2004) WordNet affect: an affective extension of WordNet. LREC 4:1083–1086

    Google Scholar 

  • Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37:267–307

    Article  Google Scholar 

  • Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346

    Article  Google Scholar 

  • Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: International conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Vancouver, pp 347–354

  • Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    Google Scholar 

  • Yerra R, Ng YK (2005) Detecting similar HTML documents using a fuzzy set information retrieval approach. In: Granular computing IEEE International Conference, IEEE. 2:693–699

  • Zadeh LA (1965) Fuzzy Sets. Inf Control 8:338–353

    Article  MATH  Google Scholar 

  • Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P (2009) Protein-protein interaction based on pairwise similarity. BMC Bioinf 10(1):150

    Article  Google Scholar 

  • Zhou P, Chaovalit L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: International conference on system sciences. IEEE, Hawaii, pp 112c–112c

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heba M. Ismail.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

Not applicable.

Informed consent

Not applicable.

Additional information

Communicated by S. Deb, T. Hanne, K.C. Wong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ismail, H.M., Belkhouche, B. & Zaki, N. Semantic Twitter sentiment analysis based on a fuzzy thesaurus. Soft Comput 22, 6011–6024 (2018). https://doi.org/10.1007/s00500-017-2994-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2994-8

Keywords

Navigation