Sentiment analysis and spam detection in short informal text using learning classifier systems

Arif, Muhammad Hassan; Li, Jianxin; Iqbal, Muhammad; Liu, Kaixu

doi:10.1007/s00500-017-2729-x

Sentiment analysis and spam detection in short informal text using learning classifier systems

Methodologies and Application
Published: 15 July 2017

Volume 22, pages 7281–7291, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Muhammad Hassan Arif¹,
Jianxin Li¹,
Muhammad Iqbal² &
…
Kaixu Liu³

1930 Accesses
56 Citations
Explore all metrics

Abstract

Sentiment analysis of public views and spam detection from social media text messages are two challenging data analysis tasks due to short informal text. This paper investigates the performance of learning classifier systems (LCS), which are rule-based machine learning techniques, in sentiment analysis of twitter messages and movie reviews, and spam detection from SMS and email data sets. In this study, an existing LCS technique is extended by introducing a novel encoding scheme to represent classifier rules in order to handle the sparseness in feature vectors, which are generated using the term frequency inverse document frequency of word n-grams and sentiment lexicons. The obtained results show that the proposed encoding scheme smoothed the learning process and generated consistently good results in all experiments conducted in this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

A survey of sentiment analysis in social media

Article 04 July 2018

Notes

References

Abdelwahab O, Elmaghraby A (2016) UofL at SemEval-2016 Task 4: multi domain word2vec for Twitter sentiment classification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 169–175
Alhessi, Y, Wicentowski R (2015) SWATAC: a sentiment analyzer using one-vs-rest logistic regression. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), Denver, Colorado, Association for Computational Linguistics, pp 636–639
Attardi G, Sartiano D (2016) UniPI at SemEval-2016 Task 4: convolutional neural networks for sentiment classification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 225–229
Bacardit J, Burke EK, Krasnogor N (2009) Improving the scalability of rule-based evolutionary learning. Memet Comput 1:55–67
Article Google Scholar
Bernado-Mansilla E, Ho TK (2003) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evolut Comput 9:82–104
Article Google Scholar
Bin S, Wasi, Neyaz R, Bouamor H, Mohit B (2014) Cmuq\(@\)Qatar: using rich lexical features for sentiment analysis on twitter. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Dublin, Ireland, 2014, Association for Computational Linguistics and Dublin City University, pp 186–191
Bull L, Kovacs T (2005) Foundations of learning classifier systems: an introduction. Stud Fuzziness Soft Comput 183:1–17
Article Google Scholar
Butz MV (2000) XCSJava 1.0: an implementation of the XCS classifier system in Java. Technical Report 2000027. Presented at 3rd international conference on artificial neural networks and genetic algorithms, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, IL, USA
Butz MV (2005) Kernel-based, ellipsoidal conditions in the real-valued XCS classifier system. In: Proceedings of the ACM GECCO companion, pp. 1835–1842
Butz MV, Wilson SW (2002) An algorithmic description of XCS. Soft Comput 6(3–4):144–153
Article Google Scholar
Chikersal P, Poria S, Cambria E (2015) SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), Denver, Colorado, Association for Computational Linguistics, pp 647–651
Clecki L, Unold O (2007) Real-valued GCS classifier system. Int J Appl Math Comput Sci 17:539–547
Article Google Scholar
Cozza V, Petrocchi M,(2016) mib at SemEval-2016 Task 4a: exploiting lexicon based features for sentiment analysis in Twitter. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 138–143
Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the 9th international conference on recent advances in natural language processing, Hissar, Bulgaria, September, pp 198–206
Ester B, Llor\(\grave{a}\) X, Garrell J (2002) XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Advances in learning classifier systems, Springer, pp 115–132
Gamallo P, Garcia M (2014) Citius: a naive-Bayes strategy for sentiment analysis on English tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Dublin, Ireland, Association for Computational Linguistics and Dublin City University, pp 171–175
Gimpel K, Schneider N, O’Connor B, Das D, Mills D, Eisenstein J, Heilman M, Yogatama D, Flanigan J, Smith NA (2011) Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers—volume 2, HLT ’11, Stroudsburg, PA, USA, Association for Computational Linguistics, pp 42–47
Hamdan H (2016) SentiSys at SemEval-2016 Task 4: feature-based system for sentiment analysis in twitter. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, Association for Computational Linguistics, pp 195–202
Holland JH, Booker LB, Colombetti M, Dorigo M, Goldberg DE, Forrest S Riolo, RL, Smith RE, Lanzi PL Stolzmann W, Wilson SW (2000) What is a learning classifier system? In: Learning classifier systems, from foundations to applications, Springer, pp 3–32
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD04), New York, NY, USA, ACM, pp 168–177
Iqbal M, Browne WN, Zhang M (2013) Evolving optimum populations with XCS classifier systems. Soft Comput 17(3):503–518
Article Google Scholar
Iqbal M, Browne WN, Zhang M (2015) Improving genetic search in XCS-based classifier systems through understanding the evolvability of classifier rules. Soft Comput 19(7):1863–1880
Article Google Scholar
Jahren BE, Fredriksen V, Gambäck B, Bungum L (2016) NTNUSentEval at SemEval-2016 Task 4: combining general classifiers for fast twitter sentiment analysis. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 103–108
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60:2169–2188
Article Google Scholar
Juncal-Martínez J, Álvarez López T, Fernández-Gavilanes M, Costa-Montenegro E, González-Castaño FJ (2016) GTI at SemEval-2016 Task 4: training a naive Bayes classifier using features of an unsupervised system. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 115–119
Lango M, Brzezinski D, Stefanowski J (2016) Put at SemEval-2016 Task 4: the ABC of Twitter sentiment analysis. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 131–137
Li J, Fong S, Zhuang Y, Khoury R (2016) Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput 20:3411–3420
Article Google Scholar
Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. Morgan & Claypool Publishers, pp 1–167. doi:10.2200/S00416ED1V01Y201204HLT016
Article Google Scholar
Mandel B, Culotta A, Boulahanis J, Stark D, Lewis B, Rodrigue J (2012) A demographic analysis of online sentiment during hurricane irene. In: Second workshop on language in social, media, pp 27–36
Mohammad S, Yang T (2011) Tracking sentiment in mail: How genders differ on emotional axes. In: Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2011), Portland, Oregon, 2011, Association for Computational Linguistics, pp 70–79
Mohammad SM Kiritchenko S, Zhu X (2013) NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the seventh international workshop on semantic evaluation exercises (SemEval-2013), Atlanta, Georgia, USA, 2013, pp 321–327
Mohammad SM, Kiritchenko S, Zhu X (2014) Sentiment analysis of short informal text. J Artif Intell Res 50:723–762
Article Google Scholar
Mohammad SM, Turney PD (2010) Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, CAAGET ’10, Stroudsburg, PA, USA, Association for Computational Linguistics, pp 26–34
Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40:621–633
Article Google Scholar
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135
Article Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, PA, pp 79–86
Parkhe V, Biswas B (2016) Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft Comput 20:3373–3379
Article Google Scholar
Ruder S, Ghaffari P, Breslin JG (2016) INSIGHT-1 at SemEval-2016 task 4: convolutional neural networks for sentiment classification and quantification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), San Diego, California, 2016, Association for Computational Linguistics, pp 183–187
Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for twitter sentiment analysis. A survey and a new dataset, the STS-Gold. In: Workshop on emotion and sentiment in social and expressive media in conjunction with AI*IA conference (ESSEM 2013), vol 1096, pp 9–21
Salathe M, Khandelwal S (2011) Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. PLoS Comput Biol 7:1–7
Article Google Scholar
Serrano-Guerrero J, Olivas JA, Romero FP, Herrera-Viedma E (2015) Sentiment analysis: a review and comparative analysis of web services. Inf Sci 311:18–38
Article Google Scholar
Stone C, Bull L (2003) For Real! XCS with continuous-valued inputs. Evolut Comput 11(3):299–336
Article Google Scholar
Thet TT, Na J, Khoo CSG (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848
Article Google Scholar
Thet TT, Na JC, Khoo CS, Shakthikumar S (2009) Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In: Proceedings of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion, TSA ’09, New York, NY, USA, ACM, pp 81–84
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
Article MathSciNet Google Scholar
Unold O (2005) Context-free grammar induction with grammar-based classifier system. Arch Control Sci 15(4):681–690
MATH Google Scholar
Urbanowicz RJ, Moore JH (2009) Learning classifier systems: a complete introduction, review, and roadmap. J Artif Evol Appl 2009:1–25
Article Google Scholar
Verma S, Vieweg S, Corvey W, Palen L, Martin J, Palmer M, Schram A, Anderson K (2011) Natural language processing to the rescue? Extracting “situational awareness” tweets during mass emergency. In: Proceedings of the fifth international conference on weblogs and social media (ICWSM 2011), Barcelona, Catalonia, Spain, 2011
Wilson SW (1995) Classifier fitness based on accuracy. Evolut Comput 3:149–175
Article Google Scholar
Wilson SW (2000) Get Real! XCS with continuous-valued inputs. In: Learning classifier systems, Springer, pp 209–219
Wilson SW (2000) Mining oblique data with XCS. In: Proceedings of the genetic and evolutionary computation conference (companion), pp 158–174
Chapter Google Scholar
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT05), Morristown, NJ, USA, Association for Computational Linguistics, pp 347–354
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT ’05, Stroudsburg, PA, USA, Association for Computational Linguistics, pp 347–354
Wilson T, Wiebe J, Hoffmann P (2009) Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput Linguist 35(3):399–433
Article Google Scholar
Winkler S, Schaller S, Dorfer V, Affenzeller M, Petz G, Karpowicz M (2015) Data-based prediction of sentiments using heterogeneous model ensembles. Soft Comput 19:3401–3412
Article Google Scholar
Zhu X, Kiritchenko S, Mohammad S (2014) NRC-Canada-2014: recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Dublin, Ireland, 2014, Association for Computational Linguistics and Dublin City University, pp 443–447

Download references

Acknowledgements

This work is supported by NSFC program (Nos. 61472022, 61421003), SKLSDE-2016ZX-11 and partly by the Beijing Advanced Innovation Center for Big Data and Brain Computing.

Author information

Authors and Affiliations

Advanced Innovation Center for Big Data and Brain Computing, School of Computer Science and Engineering, Beihang University (BUAA), Beijing, 100191, China
Muhammad Hassan Arif & Jianxin Li
Xtracta Ltd, Auckland, 1061, New Zealand
Muhammad Iqbal
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
Kaixu Liu

Authors

Muhammad Hassan Arif
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Kaixu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Li.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical standard

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arif, M.H., Li, J., Iqbal, M. et al. Sentiment analysis and spam detection in short informal text using learning classifier systems. Soft Comput 22, 7281–7291 (2018). https://doi.org/10.1007/s00500-017-2729-x

Download citation

Published: 15 July 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s00500-017-2729-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment analysis and spam detection in short informal text using learning classifier systems

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A survey of sentiment analysis in social media

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical standard

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sentiment analysis and spam detection in short informal text using learning classifier systems

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A survey of sentiment analysis in social media

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical standard

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation