Skip to main content
Log in

SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Sentiment information about social media posts is increasingly considered an important resource for customer segmentation, market understanding, and tackling other socio-economic issues. However, sentiment in social media is difficult to measure since user-generated content is usually short and informal. Although many traditional sentiment analysis methods have been proposed, identifying slang sentiment words remains a challenging task for practitioners. Though some slang words are available in existing sentiment lexicons, with new slang being generated with emerging memes, a dedicated lexicon will be useful for researchers and practitioners. To this end, we propose to build a slang sentiment dictionary to aid sentiment analysis. It is laborious and time-consuming to collect a comprehensive list of slang words and label the sentiment polarity. We present an approach to leverage web resources to construct a Slang Sentiment Dictionary (SlangSD) that is easy to expand. SlangSD is publicly available for research purposes. We empirically show the advantages of using SlangSD, the newly-built slang sentiment word dictionary for sentiment classification, and provide examples demonstrating its ease of use with a sentiment analysis system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Notes

  1. http://www.urbandictionary.com/define.php?term=out%20of%20the%20park.

  2. http://www.urbandictionary.com/define.php?term=shit-hot.

  3. http://www.urbandictionary.com/define.php?term=rocked%20his%20world.

  4. http://www.urbandictionary.com/define.php?term=rocked+her+world.

  5. http://www.urbandictionary.com/.

  6. http://www.slangsd.com/index.jsp.

  7. https://www.nytimes.com/2014/01/04/technology/a-lexicon-of-the-internet-updated-by-its-users.html.

  8. http://www.urbandictionary.com/define.php?term=Lol.

  9. https://dev.twitter.com/rest/public/search/.

  10. http://www.urbandictionary.com/yesterday.phpdate=2016-07-14.

  11. https://dev.twitter.com/rest/public/search/.

  12. https://dev.twitter.com/streaming/reference/post/statuses/filter.

  13. https://en.wikipedia.org/wiki/List_of_emoticons.

  14. http://alt.qcri.org/semeval2014/task9/.

  15. https://www.dropbox.com/sh/qikgyyl1w3jf7gx/AADUbKkGLGZR9a8u4H7fE02ma?dl=0.

  16. http://www.urbandictionary.com/define.php?term=Ugh.

  17. http://www.urbandictionary.com/define.php?term=FML.

  18. http://www.urbandictionary.com/define.php?term=hon.

  19. http://sentistrength.wlv.ac.uk/download.html.

  20. http://slangsd.com/data/SlangSD.zip.

  21. http://sentistrength.wlv.ac.uk/#About.

References

  • Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis ofkanayama2006fully twitter data. In Proceedings of the workshop on languages in social media, , Association for Computational Linguistics.

  • Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC, 10, 2200–2204.

    Google Scholar 

  • Cerini, S., Compagnoni, V., Demontis, A., Formentelli, M., & Gandini, G. (2007). Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In Language Resources and Linguistic Theory: Typology, Second Language Acquisition, English Linguistics, ed: Franco Angeli Editore, pp. 200–210.

  • Chen, L., Wang, W., Nagarajan, M., Wang, S., & Sheth, A.P. (2012). Extracting diverse sentiment expressions with target-dependent polarity from twitter. In AAAI conference on weblogs and social media.

  • Deng, L., & Wiebe, J. (2015). Mpqa 3.0: An entity/event-level sentiment corpus. In Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies.

  • Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., & Sheth, A. (2010). Multimodal social intelligence in a real-time dashboard system. The International Journal on Very Large Data Bases, 19, 825–848.

    Article  Google Scholar 

  • Hai, Z., Chang, K., Kim, J. J., & Yang, C. C. (2014). Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Transactions on Knowledge and Data Engineering, 26, 623–634.

    Article  Google Scholar 

  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 168–177.

  • Kanayama, H., & Nasukawa, T. (2006). Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp. 355–363.

  • Kundi, F. M., & Asghar, M. Z. (2014). Lexicon-based sentiment analysis in the social web. Journal of Basic and Applied Scientific Research, 4(6), 24.

    Google Scholar 

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.

    Article  Google Scholar 

  • Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29, 436–465.

    Article  Google Scholar 

  • Nielsen, F. Å. (2011). In AFINN. Technical Report, Informatics and Mathematical Modelling, Technical University of Denmark.

  • Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. The International Conference on Language Resources and Evaluation, 10, 1320–1326.

    Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL conference on empirical methods in natural language processing, Association for Computational Linguistics, pp. 79–86.

  • Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71, 2001.

    Google Scholar 

  • Qiu, G., Liu, B., Bu, J., & Chen, C. (2011). Opinion word expansion and target extraction through double propagation. Computational Linguistics, 37, 9–27.

    Article  Google Scholar 

  • Ramage, D., Dumais, S. T., & Liebling, D. J. (2010). Characterizing microblogs with topic models. The International AAAI Conference on Web and Social Media, 10, 1–1.

    Google Scholar 

  • Rosenthal, S., Ritter, A., Nakov, P., & Stoyanov, V. (2014). Semeval-2014 task 9: Sentiment analysis in twitter. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 73–80.

  • Sheth, A., & Nagarajan, M. (2009). Semantics-empowered social computing. IEEE Internet Computing, 13(1), 76.

    Article  Google Scholar 

  • Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C.D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing, Citeseer, pp. 1631–1642.

  • Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., & Zhou, M. (2016). Sentiment embeddings with applications to sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, 28(2), 496–509.

    Article  Google Scholar 

  • Tang, D., Wei, F., Qin, B., Zhou, M., & Liu, T. (2014). Building large-scale twitter-specific sentiment lexicon: A representation learning approach. In International conference on computational linguistics, pp. 172–182

  • Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61, 2544–2558.

    Article  Google Scholar 

  • Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus. arXiv preprint arXiv:cs/0212012v1.

  • Waltinger, U. (2009). Polarity reinforcement: Sentiment polarity identification by means of social semantics. In AFRICON 2009, pp. 1–6. IEEE.

  • Wu, L., Zhou, Y., Tan, F., Yang, F., & Li, J. (2011). Generating syntactic tree templates for feature-based opinion mining. In International conference on advanced data mining and applications, pp. 1–12. Springer.

Download references

Acknowledgements

We would like to thank DMML lab members for their feedback and help in this work. The work is funded, in part, by ONR N00014-16-1-2257 and the Department of Defense under the MINERVA initiative through the ONR N000141310835.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, L., Morstatter, F. & Liu, H. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Lang Resources & Evaluation 52, 839–852 (2018). https://doi.org/10.1007/s10579-018-9416-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9416-0

Keywords

Navigation