Abstract
This paper is a comparative study about text feature extraction methods in statistical learning of sentiment classification. Feature extraction is one of the most important steps in classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a research area of Linguistics that uses statistical techniques to analyze literary style. In order to assess the viability of the stylometry, we create a corpus of product reviews from the most traditional online service in Portuguese, namely, Buscapé. We gathered 2000 review about Smartphones. We use three classifiers, Support Vector Machine (SVM), Naive Bayes, and J48 to evaluate whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better result with the SVM classifier (82,75%) of accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible method for sentiment classification, outperforming the accuracy of the baseline methods. We may emphasize that approach used has promising results.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, B.: Sentiment Analysis and Opinion Mining. In: Synthesis Digital Library of Engineering and Computer Science. Morgan & Claypool (2012)
Kotsiantis, S.B.: Supervised machine learning: A review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam (2007)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-centric systems and applications. Springer (2007)
Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)
Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn., Taylor and Francis Group, Boca (2010)
He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Arabnia, H.R., Mun, Y. (eds.) IC-AI, pp. 897–903. CSREA Press (2004)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 129–136. Association for Computational Linguistics, Stroudsburg (2003)
Sharma, A., Dey, S.: A document-level sentiment analysis approach using artificial neural network and sentiment lexicons. SIGAPP Appl. Comput. Rev. 12, 67–75 (2012)
Sharma, A., Dey, S.: A boosted svm based ensemble classifier for sentiment analysis of online reviews. SIGAPP Appl. Comput. Rev. 13, 43–52 (2013)
Njolstad, P., Hoysaeter, L., Wei, W., Gulla, J.: Evaluating feature sets and classifiers for sentiment analysis of financial news. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 2, pp. 71–78 (2014)
He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Proceedings of the International Conference on Artificial Intelligence, IC-AI 2004, Proceedings of the International Conference on Machine Learning; Models, Technologies & Applications, MLMTA 2004, Las Vegas, Nevada, USA, June 21-24, vol. 2, pp. 897–903 (2004)
Hartmann, N., Avanço, L., Filho, P.P.B., Duran, M.S., das Graças Volpe Nunes, M., Pardo, T., Aluísio, S.M.: A large corpus of product reviews in portuguese: Tackling out-of-vocabulary words. In: LREC, pp. 3865–3871 (2014)
Aghdam, M.H., Ghasem-Aghaee, N., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Syst. Appl. 36, 6843–6853 (2009)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Martineau, J., Finin, T.: Delta tfidf: An improved feature space for sentiment analysis. In: ICWSM (2009)
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57, 378–393 (2006)
Iqbal, F., Khan, L.A., Fung, B.C.M., Debbabi, M.: e-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing 2010, pp. 1591–1598. ACM, New York (2010)
Pavelec, D., Justino, E., Oliveira, L.S.: Author identification using stylometric features. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 11, 59–65 (2007)
Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26, 7:1–7:29 (2008)
Iqbal, F., Hadjidj, R., Fung, B.C., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation 5(suppl.), S42–S51 (2008), The Proceedings of the Eighth Annual {DFRWS} Conference
Schmid, H.: Probabilistic part-of-speech tagging using decision trees (1994)
Pablo Gamallo, M.G.: Freeling e treetagger: um estudo comparativo no âmbito do português. Technical report, Universidade de Santiago de Compostela (2013)
Maziero, E.G., Pardo, T.A.S., Di Felippo, A., Dias-da Silva, B.C.: A base de dados lexical e a interface web do tep 2.0: Thesaurus eletrnico para o portugus do brasil. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, WebMedia 2008, pp. 390–392. ACM, New York (2008)
Tweedie, F.J., Baayen, R.H.: How variable a constant be? measures of lexical richness in perspective. Computers and the Humanities 32, 323–352 (1998)
Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, ACL 2012, vol. 2, pp. 90–94. Association for Computational Linguistics, Stroudsburg (2012)
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment in short strength detection informal text. J. Am. Soc. Inf. Sci. Technol. 61, 2544–2558 (2010)
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 675–684. ACM, New York (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Anchiêta, R.T., Neto, F.A.R., de Sousa, R.F., Moura, R.S. (2015). Using Stylometric Features for Sentiment Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)