Skip to main content

A System for Automatic Classification of Twitter Messages into Categories

  • Conference paper
  • First Online:
Modeling and Using Context (CONTEXT 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9405))

Abstract

Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8 % is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The NReadability library is used to take only the article text from a specific HTML file ignoring irrelevant text such as sidebar text containing other stories. https://github.com/marek-stoj/NReadability [Online; accessed 6-June-2015].

  2. 2.

    https://tweetinvi.codeplex.com [Online; accessed 6-June-2015].

  3. 3.

    The database contains every Wikipedia page title from the English Wikipedia, originally taken from the April 2015 Wikipedia dump. https://dumps.wikimedia.org/enwiki/20150304/ [Online; accessed 6-June-2015].

  4. 4.

    https://tweetinvi.codeplex.com [Online; accessed 6-June-2015].

References

  1. Twitter: Twitter turns six (2012). https://blog.twitter.com/2012/twitter-turns-six. Accessed 19 May 2015

  2. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 1320–1326 (2010)

    Google Scholar 

  3. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project report, Stanford, pp. 1–12 (2009)

    Google Scholar 

  4. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 241–249 (2010)

    Google Scholar 

  5. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)

    Google Scholar 

  6. McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. In: ICWSM, vol. 11, pp. 438–441 (2011)

    Google Scholar 

  9. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)

    Google Scholar 

  10. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842. ACM (2010)

    Google Scholar 

  11. Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS, vol. 6780, pp. 484–492. Springer, Heidelberg (2011)

    Google Scholar 

  12. Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM (2011)

    Google Scholar 

  13. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  14. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Athena Stassopoulou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Theodotou, A., Stassopoulou, A. (2015). A System for Automatic Classification of Twitter Messages into Categories. In: Christiansen, H., Stojanovic, I., Papadopoulos, G. (eds) Modeling and Using Context. CONTEXT 2015. Lecture Notes in Computer Science(), vol 9405. Springer, Cham. https://doi.org/10.1007/978-3-319-25591-0_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25591-0_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25590-3

  • Online ISBN: 978-3-319-25591-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics