A System for Automatic Classification of Twitter Messages into Categories

Theodotou, Alexandros; Stassopoulou, Athena

doi:10.1007/978-3-319-25591-0_44

Alexandros Theodotou¹⁶ &
Athena Stassopoulou¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9405))

Included in the following conference series:

International and Interdisciplinary Conference on Modeling and Using Context

938 Accesses
4 Citations

Abstract

Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8 % is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The NReadability library is used to take only the article text from a specific HTML file ignoring irrelevant text such as sidebar text containing other stories. https://github.com/marek-stoj/NReadability [Online; accessed 6-June-2015].
2.
https://tweetinvi.codeplex.com [Online; accessed 6-June-2015].
3.
The database contains every Wikipedia page title from the English Wikipedia, originally taken from the April 2015 Wikipedia dump. https://dumps.wikimedia.org/enwiki/20150304/ [Online; accessed 6-June-2015].
4.
https://tweetinvi.codeplex.com [Online; accessed 6-June-2015].

References

Twitter: Twitter turns six (2012). https://blog.twitter.com/2012/twitter-turns-six. Accessed 19 May 2015
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 1320–1326 (2010)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project report, Stanford, pp. 1–12 (2009)
Google Scholar
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 241–249 (2010)
Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)
Google Scholar
McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011)
Chapter Google Scholar
Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011)
Chapter Google Scholar
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. In: ICWSM, vol. 11, pp. 438–441 (2011)
Google Scholar
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)
Google Scholar
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842. ACM (2010)
Google Scholar
Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Schmorrow, D.D., Fidopiastis, C.M. (eds.) FAC 2011. LNCS, vol. 6780, pp. 484–492. Springer, Heidelberg (2011)
Google Scholar
Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM (2011)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Nicosia, 1700, Nicosia, Cyprus
Alexandros Theodotou & Athena Stassopoulou

Authors

Alexandros Theodotou
View author publications
You can also search for this author in PubMed Google Scholar
Athena Stassopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athena Stassopoulou .

Editor information

Editors and Affiliations

Department of Computer Science, Roskilde University, Roskilde, Denmark
Henning Christiansen
ENS-Pavillon Jardin, CNRS-Institut Jean-Nicod, Paris, France
Isidora Stojanovic
Department of Computer Science, University of Cyprus, Aglantzia, Cyprus
George A. Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theodotou, A., Stassopoulou, A. (2015). A System for Automatic Classification of Twitter Messages into Categories. In: Christiansen, H., Stojanovic, I., Papadopoulos, G. (eds) Modeling and Using Context. CONTEXT 2015. Lecture Notes in Computer Science(), vol 9405. Springer, Cham. https://doi.org/10.1007/978-3-319-25591-0_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-25591-0_44
Published: 15 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25590-3
Online ISBN: 978-3-319-25591-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics