Abstract
Sentiment information about social media posts is increasingly considered an important resource for customer segmentation, market understanding, and tackling other socio-economic issues. However, sentiment in social media is difficult to measure since user-generated content is usually short and informal. Although many traditional sentiment analysis methods have been proposed, identifying slang sentiment words remains a challenging task for practitioners. Though some slang words are available in existing sentiment lexicons, with new slang being generated with emerging memes, a dedicated lexicon will be useful for researchers and practitioners. To this end, we propose to build a slang sentiment dictionary to aid sentiment analysis. It is laborious and time-consuming to collect a comprehensive list of slang words and label the sentiment polarity. We present an approach to leverage web resources to construct a Slang Sentiment Dictionary (SlangSD) that is easy to expand. SlangSD is publicly available for research purposes. We empirically show the advantages of using SlangSD, the newly-built slang sentiment word dictionary for sentiment classification, and provide examples demonstrating its ease of use with a sentiment analysis system.


Notes
References
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis ofkanayama2006fully twitter data. In Proceedings of the workshop on languages in social media, , Association for Computational Linguistics.
Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC, 10, 2200–2204.
Cerini, S., Compagnoni, V., Demontis, A., Formentelli, M., & Gandini, G. (2007). Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In Language Resources and Linguistic Theory: Typology, Second Language Acquisition, English Linguistics, ed: Franco Angeli Editore, pp. 200–210.
Chen, L., Wang, W., Nagarajan, M., Wang, S., & Sheth, A.P. (2012). Extracting diverse sentiment expressions with target-dependent polarity from twitter. In AAAI conference on weblogs and social media.
Deng, L., & Wiebe, J. (2015). Mpqa 3.0: An entity/event-level sentiment corpus. In Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies.
Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., & Sheth, A. (2010). Multimodal social intelligence in a real-time dashboard system. The International Journal on Very Large Data Bases, 19, 825–848.
Hai, Z., Chang, K., Kim, J. J., & Yang, C. C. (2014). Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Transactions on Knowledge and Data Engineering, 26, 623–634.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 168–177.
Kanayama, H., & Nasukawa, T. (2006). Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp. 355–363.
Kundi, F. M., & Asghar, M. Z. (2014). Lexicon-based sentiment analysis in the social web. Journal of Basic and Applied Scientific Research, 4(6), 24.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29, 436–465.
Nielsen, F. Å. (2011). In AFINN. Technical Report, Informatics and Mathematical Modelling, Technical University of Denmark.
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. The International Conference on Language Resources and Evaluation, 10, 1320–1326.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL conference on empirical methods in natural language processing, Association for Computational Linguistics, pp. 79–86.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71, 2001.
Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational Linguistics, 37, 9–27.
Ramage, D., Dumais, S. T., & Liebling, D. J. (2010). Characterizing microblogs with topic models. The International AAAI Conference on Web and Social Media, 10, 1–1.
Rosenthal, S., Ritter, A., Nakov, P., & Stoyanov, V. (2014). Semeval-2014 task 9: Sentiment analysis in twitter. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 73–80.
Sheth, A., & Nagarajan, M. (2009). Semantics-empowered social computing. IEEE Internet Computing, 13(1), 76.
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C.D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing, Citeseer, pp. 1631–1642.
Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., & Zhou, M. (2016). Sentiment embeddings with applications to sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, 28(2), 496–509.
Tang, D., Wei, F., Qin, B., Zhou, M., & Liu, T. (2014). Building large-scale twitter-specific sentiment lexicon: A representation learning approach. In International conference on computational linguistics, pp. 172–182
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61, 2544–2558.
Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus. arXiv preprint arXiv:cs/0212012v1.
Waltinger, U. (2009). Polarity reinforcement: Sentiment polarity identification by means of social semantics. In AFRICON 2009, pp. 1–6. IEEE.
Wu, L., Zhou, Y., Tan, F., Yang, F., & Li, J. (2011). Generating syntactic tree templates for feature-based opinion mining. In International conference on advanced data mining and applications, pp. 1–12. Springer.
Acknowledgements
We would like to thank DMML lab members for their feedback and help in this work. The work is funded, in part, by ONR N00014-16-1-2257 and the Department of Defense under the MINERVA initiative through the ONR N000141310835.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, L., Morstatter, F. & Liu, H. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Lang Resources & Evaluation 52, 839–852 (2018). https://doi.org/10.1007/s10579-018-9416-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-018-9416-0