Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Ismail, Heba M.; Belkhouche, Boumediene; Zaki, Nazar

doi:10.1007/s00500-017-2994-8

Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Focus
Published: 05 January 2018

Volume 22, pages 6011–6024, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

797 Accesses
17 Citations
Explore all metrics

Abstract

We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial naïve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

STS-Gold dataset can be requested from the authors at: http://kmi.open.ac.uk/people/member/hassan-saif
Stanford dataset official page: http://help.sentiment140.com/for-students
Stanford testing and training datasets can be downloaded from: https://docs.google.com/file/d/0B04GJPshIjmPRnZManQwWEdTZjg/edit

References

Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: features selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):1–34
Article Google Scholar
Agarwal A, Xie B, Vovsha I, Rambow O (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38
Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: 23rd International conference on computational linguistics. Association for Computational Linguistics, pp 36–44
Batra S, Rao D (2010) Entity based sentiment analysis on Twitter. Science 9(4):1–12
Google Scholar
Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques for sentiment analysis of Twitter data. In: International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591
Boulianne S (2015) Social media use and participation: a meta-analysis of current research. Inf Commun Soc 18(5):524–538
Article Google Scholar
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28:15–21
Article Google Scholar
Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. AAAI fall symposium: commonsense knowledge 10
Elfeky M, Elhawary M (2010) Mining Arabic business reviews. In: International conference in data mining. IEEE, Sydney. pp 1108–1113
Esuli A (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation, pp 417–422 (2006)
Garcia I, Ng YK (2006) Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. In: Tools with artificial intelligence, ICTAI’06. IEEE, pp 465–473
Go A, Bhayani R, Huang L (2009). Twitter sentiment classification using distant supervision. Stanford digital library technologies projects
Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. Ldv Forum 20(1):19–62
Google Scholar
Ismail HM (2014) Using concept maps and fuzzy set information retrieval model to dynamically personalize RSS feeds. Int J Comput Sci Netw Secur 14(2):10
Google Scholar
Ismail HM, Harous S, Belkhouche B (2016) A comparative analysis of machine learning classifiers for Twitter sentiment analysis. Res Comput Sci 110:71–83
Google Scholar
Ismail HM, Zaki N, Belkhouche B (2016) Using custom fuzzy thesaurus to incorporate semantics and reduce data sparsity for Twitter sentiment analysis. In: 3rd International conference on soft computing and machine intelligence (ISCMI). IEEE, pp 47–52
Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Portland, pp 151–160
Kao A, Poteet SR (eds) (2007) Natural language processing and text mining. Springer, Berlin
MATH Google Scholar
Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of Twitter posts. Expert Syst Appl 40(10):4065–4074
Article Google Scholar
Kraft DH, Bordogna G, Pasi G (1999) Fuzzy set techniques in information retrieval. Fuzzy Sets Approx Reason Inf Syst 5(6):469–510
Article MathSciNet MATH Google Scholar
Lee B, Pang L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Google Scholar
Lima ACE, de Castro LN, Corchado JM (2015) A polarity analysis framework for Twitter messages. Applied Mathematics and Computation 270(1):756–767
Article Google Scholar
Liu Y, Kliman-Silver C, Mislove A (2014) The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior. ICWSM 30:5–314
Google Scholar
LOL, OMG and ILY: 60 of The Dominating Abbreviations (2014) (Just English) Retrieved November 2015, from http://justenglish.me/2014/07/18/lol-omg-and-ily-60-of-the-dominating-abbreviations/
Manning CD, Raghavan P, Schütze H (2009) Text classification and naive bayes. In: Introduction to information retrieval. Cambridge University Press, pp 253–287
Ogawa Y, Morita T, Kobayashi K (1991) A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets Syst 39(2):163–179
Article MathSciNet Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up: sentiment classification using machine learning techniques. Association for Computational Linguistics, Stroudsburg
Book Google Scholar
Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company Tweets. In: 2nd International workshop on search and mining user-generated contents. ACM, New York, pp 95–102
Pew Research Center. (2014, November). Cell Phones, Social Media, and Campaign 2014. (Pew Research Center) Retrieved January 2016, from http://www.pewinternet.org/2014/11/03/cell-phones-social-media-and-campaign-2014
Porter MF (1980) An Algorithm for Suffix Stripping. Program 14(3):130–137
Article Google Scholar
Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis a survey and a new dataset, the STS-gold. In: Interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013). Italy
Saif H, He Y, Alani H (2012) Alleviating data sparsity for twitter sentiment analysis. Making sense of microposts. CEUR-WS. org, Lyon, France
Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of twitter. Inf Process Manag 52(1):5–19
Article Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing. UK, pp 53–63
Strapparava C, Valitutti A (2004) WordNet affect: an affective extension of WordNet. LREC 4:1083–1086
Google Scholar
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37:267–307
Article Google Scholar
Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346
Article Google Scholar
Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: International conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Vancouver, pp 347–354
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Google Scholar
Yerra R, Ng YK (2005) Detecting similar HTML documents using a fuzzy set information retrieval approach. In: Granular computing IEEE International Conference, IEEE. 2:693–699
Zadeh LA (1965) Fuzzy Sets. Inf Control 8:338–353
Article MATH Google Scholar
Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P (2009) Protein-protein interaction based on pairwise similarity. BMC Bioinf 10(1):150
Article Google Scholar
Zhou P, Chaovalit L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: International conference on system sciences. IEEE, Hawaii, pp 112c–112c

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, UAE
Heba M. Ismail, Boumediene Belkhouche & Nazar Zaki

Authors

Heba M. Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Boumediene Belkhouche
View author publications
You can also search for this author in PubMed Google Scholar
Nazar Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heba M. Ismail.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

Not applicable.

Informed consent

Not applicable.

Additional information

Communicated by S. Deb, T. Hanne, K.C. Wong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ismail, H.M., Belkhouche, B. & Zaki, N. Semantic Twitter sentiment analysis based on a fuzzy thesaurus. Soft Comput 22, 6011–6024 (2018). https://doi.org/10.1007/s00500-017-2994-8

Download citation

Published: 05 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00500-017-2994-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Abstract

Access this article

Similar content being viewed by others

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Social media analytics: a survey of techniques, tools and platforms

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Research involving human participants and/or animals

Informed consent

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Abstract

Access this article

Similar content being viewed by others

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Social media analytics: a survey of techniques, tools and platforms

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Research involving human participants and/or animals

Informed consent

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation